Gene expression profiling for identification monitoring and treatment of rheumatoid arthritis

ABSTRACT

A method is provided in various embodiments for determining a profile data set for a subject with rheumatoid arthritis or inflammatory conditions related to rheumatoid arthritis based on a sample from the subject, wherein the sample provides a source of RNAs. The method includes using amplification for measuring the amount of RNA corresponding to at least 2 constituents from Tables 1-2 and Tables 4-10. The profile data set comprises the measure of each constituent, and amplification is performed under measurement conditions that are substantially repeatable.

REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Ser. No. 60/721,052, filedSep. 27, 2005, which is incorporated herein by reference in itsentirety.

FIELD OF THE INVENTION

The present invention relates generally to the identification ofbiological markers associated with the identification of rheumatoidarthritis (RA). More specifically, the present invention relates to theuse of gene expression data in the identification, monitoring andtreatment of rheumatoid arthritis and in characterization and evaluationof inflammatory conditions induced or related to rheumatoid arthritis.

BACKGROUND OF THE INVENTION

Rheumatoid arthritis (RA) is an autoimmune disease which causes chronicinflammation of the joints, the tissue around the joints, as well asother organs in the body. Because it can affect multiple other organs ofthe body, rheumatoid arthritis is a systemic illness and is sometimescalled rheumatoid disease. Rheumatoid arthritis is a chronic disease,mainly characterized by inflammation of the lining, or synovium, of thejoints. It can lead to long-term joint damage, resulting in chronicpain, loss of function and disability.

RA can start in any joint, but it most commonly begins in the smallerjoints of the fingers, hands and wrists. In general, more joint erosionindicates more severe disease activity. RA progresses in three stages.The first stage is the swelling of the synovial lining, causing pain,warmth, stiffness, redness and swelling around the joint. Other commonphysical symptoms include fatigue, weakness, pain associated withprolonged sitting, occurrence of flares of disease activity followed byremission, rheumatoid nodules (lumps of tissue under the skin, typicallyfound on the elbows), and muscle pain. Second is the rapid division andgrowth of cells, or pannus, which causes the synovium to thicken. In thethird stage, the inflamed cells release enzymes that may digest bone andcartilage, often causing the involved joint to lose its shape andalignment, more pain, and loss of movement.

Because it is a chronic disease, RA continues indefinitely and may notgo away. Frequent flares in disease activity can occur. Early diagnosisand treatment of RA is critical to living a productive lifestyle.Studies have shown that early aggressive treatment of RA can limit jointdamage, which in turn limits loss of movement, decreased ability towork, higher medical costs and potential surgery. Currently, thecharacterization of a disease condition related to RA (includingdiagnosis, staging, monitoring disease progression, monitoring treatmenteffects on disease activity) is imprecise. There is no one test whichpositively indicates a subject has RA. A diagnosis typically is madefrom a combination of the following procedures and tests: a medicalhistory; a physical exam looking for common features reported in RAincluding but not limited to joint swelling, joint tenderness, loss ofmotion in the joints, and joint malalignment; signs of RA in otherorgans including but not limited to skin, lungs and eyes; lab tests,including but not limited to measuring blood cell count, erythrocytesedimentation rate (ESR), C-Reactive Protein levels, Rheumatoid Factor,and Antinuclear Antibodies; and imaging studies, including but notlimited to X-rays, magnetic resonance imaging (MRI), joint ultrasound,and bone densitometry (DEXA). Thus a need exists for better ways todiagnose and monitor the progression and treatment of rheumatoidarthritis.

Several therapeutic options exist for the treatment of RA. The majorgoals of therapy are to relieve pain, swelling, and fatigue, and toimprove joint function, stop joint damage, and prevent disability anddisease related morbidity. Treatment of the disease may involve acombination of two or more therapeutics compounds, includingnon-steroidal anti-inflammatory drugs, (“NSAIDs”, e.g., ibuprofen),COX-2 inhibitors (e.g., celecoxib (Celebrex)), low dose corticosteroids,disease-modifying anti-rheumatic drugs (“DMARDs”, e.g., methotrexate,leflunomide, gold thiomalate, aurothioglucose, or auranofin), Tumornecrosis factor (“TNF”) inhibitors (e.g., etanercept (Enbrel),infliximab (Remicade), and adalimumab (Humira), interleukin-1 inhibitors(e.g., injectible anakinra (Kineret)), and other biologic responsemodifiers (“BRMs”). However, careful monitoring of DMARD and BRM isessential to treatment. Information on any condition of a particularpatient and a patient's response to types and dosages of therapeutic ornutritional agents has become an important issue in clinical medicinetoday not only from the aspect of efficiency of medical practice for thehealth care industry but for improved outcomes and benefits for thepatients. Thus, there is the need for tests which can aid in thediagnosis and monitor the progression and treatment of RA.

SUMMARY OF THE INVENTION

The invention is based in part upon the identification of geneexpression profiles associated with rheumatoid arthritis (RA). Thesesgenes are referred to herein as RA-associated genes. More specifically,the invention is based upon the surprising discovery that detection ofas few as two RA-associated genes is capable of identifying individualswith or without RA with at least 75% accuracy.

In various aspects the invention provides a method for determining aprofile data set for characterizing a subject with rheumatoid arthritisor an inflammatory condition related to rheumatoid arthritis based on asample from the subject, the sample providing a source of RNAs, by usingamplification for measuring the amount of RNA in a panel of constituentsincluding at least 2 constituents from any of Tables 1, 2, 3, 4, 5, 6,7, 8, 9 or 10 and arriving at a measure of each constituent. The profiledata set contains the measure of each constituent of the panel.

Also provided by the invention is a method of characterizing rheumatoidarthritis or inflammatory condition related to rheumatoid arthritis in asubject, based on a sample from the subject, the sample providing asource of RNAs, by assessing a profile data set of a plurality ofmembers, each member being a quantitative measure of the amount of adistinct RNA constituent in a panel of constituents selected so thatmeasurement of the constituents enables characterization of rheumatoidarthritis.

In yet another aspect the invention provides a method of characterizingrheumatoid arthritis or an inflammatory condition related to rheumatoidarthritis in a subject, based on a sample from the subject, the sampleproviding a source of RNAs, by determining a quantitative measure of theamount of at least one constituent from Table 4 and 7.

The panel of constituents are selected so as to distinguish from anormal and a RA-diagnosed subject. The RA-diagnosed subject is washedout from therapy. For example, the subject is washed out from therapyfor at least 1 week, 2 weeks, 3 weeks, 1 month, two months, or up tothree or more months. Alternatively, the panel of constituents isselected as to permit characterizing the severity of RA in relation to anormal subject over time so as to track movement toward normal as aresult of successful therapy and away from normal in response tosymptomatic flare. In other aspects of the invention, the panel ofconstituents are selected so as to distinguish, e.g., classify stable RAsubjects from unstable RA subjects. By a stable RA subject it is meantthat the subject was responsive to the therapeutic being administered.By an unstable RA subject is meant that the disease was not respondingto the therapeutic being administered. Thus the methods of the inventionare used to determine efficacy of treatment of a particular subject.

Preferably, the panel of constituents are selected so as to distinguish,e.g., classify between a normal and a RA-diagnosed subject or an stableand a unstable subject with at least 75%, 80%, 85%, 90%, 95%, 97%, 98%,99% or greater accuracy. By “accuracy” is meant that the method has theability to distinguish, e.g., classify, between subjects havingrheumatoid arthritis, or an inflammatory condition associated withrheumatoid arthritis, and those that do not. When the method is used todetermine whether the subject is stable or unstable, the term “accuracy”is meant that the method has the ability to distinguish, e.g., classify,between subjects that are responding to therapy and those that do not.Accuracy is determined for example by comparing the results of the GeneExpression Profiling to standard accepted clinical methods of diagnosingRA, e.g., one or more symptoms of RA such as tender and swollen joints,fatigue, pain and stiffness in the joints, inflammation, or increasedserum CRP levels.

The panel contains 10, 8, 5, 4, 3 or fewer constituents. Optimally, thepanel of constituents includes TLR2, MMP9, IFI16, TGFB1, NFKB1, TIMP1,ICAM1, STAT3, CSPG2 or HLADRA. The panel includes two or moreconstituents from Table 5. Preferably, the panel includes any 2, 3, 4,or 5 genes in the combination shown in Tables, 5, 6, 8, 9 and 10respectively. For example the panel contains i) TLR2 and one or more orthe following: CD4, PTGS2, IL18BP, HSPA1A, HMBG1, C1QA, MNDA, CD19,CD86, SERPING1, CD8A, PTPRC, MYC, NFKB1, TNFSF5, LTA, TGFB1, DPP4, EGR1,IL1R1, ICAM1, IL1RN, TIMP1, MNDA, MPO, GCLC, APAF1, MMP9, TNFSF6,PLA2G7, CYBB, CD14, SERPINE1, HLADRA, MEF2C, MMP9, CASP9 pr IL1B; ii)MMP9 and one or more of the following: SERPING1, PTGS2, IFI16, HSPA1A,CD4, C1QA, MNDA, IL18BP, IL1R1, MYC, APAF1, CD86, CD19, SERPINE1, HMGB1,MPO, NFKB1, TGFB1, EGR1, PLAUR, TNFSF5, SERPINA1, LTA, TIMP1, ICAM1,TNF, TLR2, IL1B, IL1RN, IL18, ADAM17, PTPRC, CD14, HMOX1, or CD8A.; iii)IFI16 and one or more of the following: HMGB1, SERPINE1, CD19, IL1R,NFKB1, MPO, MYC, TIMP1, IL18BP, SERPINE1, CD19, ELA2, TGFB1, IL10, C1QA,PTGS2, ADAM17, IL18, CD4, HMOX1, CD86, HSPA1A, MNDA, TLR2, or MMP9; iv)TGFB1 and one or more of the following CD4, IL18BP, PTGS2, NFKB1, TLR2,IFI16, IL1R1, IL10, SERPINA1, SERPING1, MMP9, HLADRA, HSPA1A or ICAM1;v) NFKB1 and one or more of the following: CD4, IL18BP, TLR2, MMP9,IL-10, IFI16, TIMP1, CD14, IL1R1, CYBB, SERPING1, PTGS2, MYC, SERPINA1,EGR1 or TNFSF5; vi) TIMP1 and one or more of the following: CD4, MYC,SERPING1, IFI16, SERPINA1, EGR1, or TNFSF5; vii) ICAM1 and one or moreof the following: HLADRA, HSPA1A, CD14, TGFBR2, MMP9, TGFB1, CSPG2,STAT3, MEF2C, IL18, CD4 and NFB1B; viii) STAT3 and one or more of thefollowing: HLADRA, HSPA1A, CD14,TGFBR2, MMP9, TGFB1, CSPG2, ICAM1 orEGR1; ix) CSPG2 and one or more of the following: HLADRA, IL18, CD14HSPA1A, IL1B, EGR1, TGFB1, CASP9, ITGAL, STAT3, EGR1, ICAM1, CD4, orMEF2C; vi) HLADRA and one or more of the following: CASP9, MEF2C, ITGAL,IL18, NFKB1B, CD4, NFKB1, TGFBR2, SERPINE1, CD14, HSPA1A or TLR2.

Optionally, assessing may further include comparing the profile data setto a baseline profile data set for the panel. The baseline profile dataset is related to the rheumatoid arthritis or an inflammatory conditionrelated to rheumatoid arthritis to be characterized. The baselineprofile data set is derived from one or more other samples from the samesubject, taken when the subject is in a biological condition differentfrom that in which the subject was at the time the first sample wastaken, with respect to at least one of age, nutritional history, medicalcondition, clinical indicator, medication, physical activity, body mass,and environmental exposure, and the baseline profile data set may bederived from one or more other samples from one or more differentsubjects. In addition, the one or more different subjects may have incommon with the subject at least one of age group, gender, ethnicity,geographic location, nutritional history, medical condition, clinicalindicator, medication, physical activity, body mass, and environmentalexposure. A clinical indicator may be used to assess rheumatoidarthritis or an inflammatory condition related to rheumatoid arthritisof the one or more different subjects, and may also include interpretingthe calibrated profile data set in the context of at least one otherclinical indicator, wherein the at least one other clinical indicatorincludes blood chemistry, X-ray or other radiological or metabolicimaging technique, other chemical assays, and physical findings.

The baseline profile data set may be derived from one or more othersamples from the same subject taken under circumstances different fromthose of the first sample, and the circumstances may be selected fromthe group consisting of (i) the time at which the first sample is taken,(ii) the site from which the first sample is taken, (iii) the biologicalcondition of the subject when the first sample is taken.

By rheumatoid arthritis or an inflammatory condition related torheumatoid arthritis is meant that the condition is an autoimmunecondition, an environmental condition, a viral infection, a bacterialinfection, a eukaryotic parasitic infection, or a fungal infection.

The sample is any sample derived from a subject which contains RNA. Forexample the sample is blood, a blood fraction, body fluid, and apopulation of cells or tissue from the subject.

Optionally one or more other samples can be taken over an interval oftime that is at least one month between the first sample and the one ormore other samples, or taken over an interval of time that is at leasttwelve months between the first sample and the one or more samples, orthey may be taken pre-therapy intervention or post-therapy intervention.In such embodiments, the first sample may be derived from blood and thebaseline profile data set may be derived from tissue or body fluid ofthe subject other than blood. Alternatively, the first sample is derivedfrom tissue or bodily fluid of the subject and the baseline profile dataset is derived from blood.

All of the forgoing embodiments are carried out wherein the measurementconditions are substantially repeatable, particularly within a degree ofrepeatability of better than five percent or more particularly within adegree of repeatability of better than three percent, and/or whereinefficiencies of amplification for all constituents are substantiallysimilar, more particularly wherein the efficiency of amplification iswithin two percent, and still more particularly wherein the efficiencyof amplification for all constituents is less than one percent.

Additionally the invention includes storing the profile data set in adigital storage medium. Optionally, storing the profile data setincludes storing it as a record in a database.

Unless otherwise defined, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Although methods and materialssimilar or equivalent to those described herein can be used in thepractice or testing of the present invention, suitable methods andmaterials are described below. All publications, patent applications,patents, and other references mentioned herein are incorporated byreference in their entirety. In case of conflict, the presentspecification, including definitions, will control. In addition, thematerials, methods, and examples are illustrative only and not intendedto be limiting.

Other features and advantages of the invention will be apparent from thefollowing detailed description and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A and 1B compare two different populations using Gene ExpressionProfiles (with respect to 48 loci of the Inflammation Gene ExpressionPanel included in Tables 1 and 2).

FIG. 2 compares a normal population with a rheumatoid arthritispopulation derived from a longitudinal study.

FIG. 3 also shows the effect over time, on inflammatory gene expressionin a single human subject suffering from rheumatoid arthritis, of theadministration of a TNF-inhibiting compound, but here the expression isshown in comparison to the cognate locus average previously determinedfor the normal (i.e., undiagnosed, healthy) population.

FIG. 4A further illustrates the consistency of inflammatory geneexpression in a population; FIG. 4B shows the normal distribution ofindex values obtained from an undiagnosed population; FIG. 4Cillustrates the use of the same index as FIG. 4B, where the inflammationmedian for a normal population has been set to zero and both normal anddiseased subjects are plotted in standard deviation units relative tothat median.

FIG. 5 plots, in a fashion similar to that of FIG. 4A, Gene ExpressionProfiles, for the same 7 loci as in FIG. 4A, two different (responder v.non-responder) 6-subject populations of rheumatoid arthritis patients.

FIG. 6 illustrates application of an algorithm (shown in the figure),providing an index pertinent to rheumatoid arthritis (RA) as appliedrespectively to normal subjects, RA patients, and bacteremia patients.

FIG. 7 thus illustrates use of the inflammation index for assessment ofa single subject suffering from rheumatoid arthritis, who has notresponded well to traditional therapy with methotrexate.

FIG. 8 similarly illustrates use of the inflammation index forassessment of three subjects suffering from rheumatoid arthritis, whohave not responded well to traditional therapy with methotrexate.

FIGS. 9-11 show the inflammation index for an international group ofsubjects, suffering from rheumatoid arthritis, undergoing three separatetreatment regimens.

FIG. 12 illustrates application of a statistical T-test to identifypotential members of a signature gene expression panel that is capableof distinguishing between normal subjects and subjects suffering fromunstable rheumatoid arthritis.

FIG. 13 illustrates biomarkers identified with RA that illustrateincreased gene expression values relative to normal gene expressionvalues for patients unstable on DMARD therapy or stable on DMARDtherapy, compared to patients on TNF inhibitor therapy who exhibit morenormal gene expression values.

FIG. 14 illustrates how patients with active RA exhibit statisticallysignificant gene expression values relative to normal gene expressionvalues.

FIG. 15 illustrates another study with active RA patients exhibitingstatistically significant increased gene expression relative to normalgene expression values.

FIG. 16 illustrates how RA subjects stable at least 3 months on TNFinhibitors exhibit normal gene expression.

FIG. 17 illustrates how biomarkers are effective at tracking effectivetherapy in RA patients.

FIG. 18 illustrates inflammation index values for patients clinicallystable on methotrexate, at the beginning of the study.

FIG. 19 illustrates inflammation index values for patients clinicallystable on TNF inhibitor Enbrel at the beginning of the study.

FIG. 20 illustrates inflammation index values for patients clinicallystable on the TNF inhibitor Remicade, at the beginning of the study.

FIG. 21 illustrates inflammation index values for patients clinicallystable on the TNF inhibitor Remicade, 4 weeks into the study.

FIG. 22 illustrates abberant patient 01a gene expression values in FIGS.50 and 58, resulting from an active RA flare 1-week after samplecollection for gene expression analysis.

FIG. 23 illustrates a determination of associations of clinicalendpoints for RA therapy.

FIG. 24 illustrates a determination of associations of RA biomarkers andclinical endpoints using simple correlation analysis.

FIG. 25 illustrates a mixed model analysis used to compare individualbiomarkers for predicting RA status relative to traditional physiciandetermined DAS values.

FIG. 26 illustrates a study of RA patients unstable Methotrexate and therelationship between gene expression (ACt) to Physician's Assessment ofDisease (DAS) using simple correlation or mixed model analyses.

FIG. 27 illustrates a scatterplot of CD4 by TLR2 for 132 Normals and 22RAs.

FIG. 28 illustrates a scatterplot of CD4 by NFKB1 for 133 Normals and 22RAs (Plot separately identifies Normal #55, which is missing on TLR2).

FIG. 29 illustrates a scatterplot of CD4 by tPN (=TLR2+NFKB 1) for 132Normals and 22 RAs. Appended line shows that RAs are perfectlydistinguished from Normals on the basis of CD4 and tPN.

FIG. 30 illustrates the best least squares fitting line and lower 95%predicted confidence limit to the Normals data showing the expectedvalue for tPN given a particular expression level for CD4.

FIG. 31 illustrates the best least squares fitting line and lower 95%predicted confidence limit to the Normals data showing the expectedvalue for tPN2 given a particular expression level for CD4.

FIG. 32 illustrates comparison of Normals from 2 studies on expressionlevels for CD4 and TLR2.

FIG. 33 illustrates comparison of RAs from 2 studies with respect toexpression levels on CD4 and TRL2.

FIG. 34 illustrates comparison of Normals vs. Unstable RAs with respectto CD4 and TLR2 expression levels.

FIG. 35 illustrates data for both washed-out and unstable RA studiestogether.

FIG. 36 illustrates the approximate 95% ellipsoid for normals data.

FIG. 37 illustrates the relationship between tPn and CD4 for all casesin the washed out study plus 11 MS subjects, showing that separation isapparent between 3 MS subjects who look like RA subjects and the other 8MS subjects who look more like Normals.

FIG. 38 illustrates original data with very large amount of error added(s=2).

FIG. 39 illustrates original data with large amount of error added(s=1).

FIG. 40 illustrates original data with moderate amount of error added(s=0.5).

FIG. 41 illustrates original data with small amount of error added(s=0.2).

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS Definitions

The following terms shall have the meanings indicated unless the contextotherwise requires:

“Algorithm” is a set of rules for describing a biological condition. Therule set may be defined exclusively algebraically but may also includealternative or multiple decision points requiring domain-specificknowledge, expert interpretation or other clinical indicators.

An “agent” is a “composition” or a “stimulus”, as those terms aredefined herein, or a combination of a composition and a stimulus.

“Amplification” in the context of a quantitative RT-PCR assay is afunction of the number of DNA replications that are tracked to provide aquantitative determination of its concentration. “Amplification” hererefers to a degree of sensitivity and specificity of a quantitativeassay technique. Accordingly, amplification provides a measurement ofconcentrations of constituents that is evaluated under conditionswherein the efficiency of amplification and therefore the degree ofsensitivity and reproducibility for measuring all constituents issubstantially similar.

A “baseline profile data set” is a set of values associated withconstituents of a Gene Expression Panel resulting from evaluation of abiological sample (or population or set of samples) under a desiredbiological condition that is used for mathematically normative purposes.The desired biological condition may be, for example, the condition of asubject (or population or set of subjects) before exposure to an agentor in the presence of an untreated disease or in the absence of adisease. Alternatively, or in addition, the desired biological conditionmay be health of a subject or a population or set of subjects.Alternatively, or in addition, the desired biological condition may bethat associated with a population or set of subjects selected on thebasis of at least one of age group, gender, ethnicity, geographiclocation, nutritional history, medical condition, clinical indicator,medication, physical activity, body mass, and environmental exposure.

A “set” or “population” of samples or subjects refers to a defined orselected group of samples or subjects wherein there is an underlyingcommonality or relationship between the members included in the set orpopulation of samples or subjects.

A “population of cells” refers to any group of cells wherein there is anunderlying commonality or relationship between the members in thepopulation of cells, including a group of cells taken from an organismor from a culture of cells or from a biopsy, for example,

A “biological condition” of a subject is the condition of the subject ina pertinent realm that is under observation, and such realm may includeany aspect of the subject capable of being monitored for change incondition, such as health; disease including cancer; trauma; aging;infection; tissue degeneration; developmental steps; physical fitness;obesity, and mood. As can be seen, a condition in this context may bechronic or acute or simply transient. Moreover, a targeted biologicalcondition may be manifest throughout the organism or population of cellsor may be restricted to a specific organ (such as skin, heart, eye orblood), but in either case, the condition may be monitored directly by asample of the affected population of cells or indirectly by a samplederived elsewhere from the subject. The term “biological condition”includes a “physiological condition”.

“Body fluid” of a subject includes blood, urine, spinal fluid, lymph,mucosal secretions, prostatic fluid, semen, haemolymph or any other bodyfluid known in the art for a subject.

“Calibrated profile data set” is a function of a member of a firstprofile data set and a corresponding member of a baseline profile dataset for a given constituent in a panel.

A “clinical indicator” is any physiological datum used alone or inconjunction with other data in evaluating the physiological condition ofa collection of cells or of an organism. This term includes pre-clinicalindicators.

A “composition” includes a chemical compound, a nutriceutical, apharmaceutical, a homeopathic formulation, an allopathic formulation, anaturopathic formulation, a combination of compounds, a toxin, a food, afood supplement, a mineral, and a complex mixture of substances, in anyphysical state or in a combination of physical states.

To “derive” a profile data set from a sample includes determining a setof values associated with constituents of a Gene Expression Panel either(i) by direct measurement of such constituents in a biological sample or(ii) by measurement of such constituents in a second biological samplethat has been exposed to the original sample or to matter derived fromthe original sample.

“Distinct RNA or protein constituent” in a panel of constituents is adistinct expressed product of a gene, whether RNA or protein. An“expression” product of a gene includes the gene product whether RNA orprotein resulting from translation of the messenger RNA.

A “Gene Expression Panel” is an experimentally verified set ofconstituents, each constituent being a distinct expressed product of agene, whether RNA or protein, wherein constituents of the set areselected so that their measurement provides a measurement of a targetedbiological condition.

A “Gene Expression Profile” is a set of values associated withconstituents of a Gene Expression Panel resulting from evaluation of abiological sample (or population or set of samples).

A “Gene Expression Profile Inflammatory Index” is the value of an indexfunction that provides a mapping from an instance of a Gene ExpressionProfile into a single-valued measure of inflammatory condition.

The “health” of a subject includes mental, emotional, physical,spiritual, allopathic, naturopathic and homeopathic condition of thesubject.

“Index” is an arithmetically or mathematically derived numericalcharacteristic developed for aid in simplifying or disclosing orinforming the analysis of more complex quantitative information. Adisease or population index may be determined by the application of aspecific algorithm to a plurality of subjects or samples with a commonbiological condition.

“Inflammation” is used herein in the general medical sense of the wordand may be an acute or chronic; simple or suppurative; localized ordisseminated; cellular and tissue response initiated or sustained by anynumber of chemical, physical or biological agents or combination ofagents.

“Inflammatory state” is used to indicate the relative biologicalcondition of a subject resulting from inflammation, or characterizingthe degree of inflammation.

A “large number” of data sets based on a common panel of genes is anumber of data sets sufficiently large to permit a statisticallysignificant conclusion to be drawn with respect to an instance of a dataset based on the same panel.

“Multiple sclerosis” (MS) is a debilitating wasting disease. The diseaseis associated with degeneration of the myelin sheaths surrounding nervecells which leads to a loss of motor and sensory function.

A “normative” condition of a subject to whom a composition is to beadministered means the condition of a subject before administration,even if the subject happens to be suffering from a disease.

A “panel” of genes is a set of genes including at least twoconstituents.

“Rheumatoid Arthritis” (RA) is a chronic (long-term) autoimmune diseasethat causes inflammation of the joints and surrounding tissues, causingjoint damage and deformity. It can also affect other organs.

A “sample” from a subject may include a single cell or multiple cells orfragments of cells or an aliquot of body fluid, taken from the subject,by means including venipuncture, excretion, ejaculation, massage,biopsy, needle aspirate, lavage sample, scraping, surgical incision orintervention or other means known in the art.

A “Signature Profile” is an experimentally verified subset of a GeneExpression Profile selected to discriminate a biological condition,agent or physiological mechanism of action.

A “Signature Panel” is a subset of a Gene Expression Panel, theconstituents of which are selected to permit discrimination of abiological condition, agent or physiological mechanism of action.

A “subject” is a cell, tissue, or organism, human or non-human, whetherin vivo, ex vivo or in vitro, under observation. As used herein,reference to evaluating the biological condition of a subject based on asample from the subject, includes using blood or other tissue samplefrom a human subject to evaluate the human subject's condition; it alsoincludes, for example, using a blood sample itself as the subject toevaluate, for example, the effect of therapy or an agent upon thesample.

A “stimulus” includes (i) a monitored physical interaction with asubject, for example ultraviolet A or B, or light therapy for seasonalaffective disorder, or treatment of psoriasis with psoralen or treatmentof melanoma with embedded radioactive seeds, other radiation exposure,and (ii) any monitored physical, mental, emotional, or spiritualactivity or inactivity of a subject.

“Therapy” includes all interventions whether biological, chemical,physical, metaphysical, or combination of the foregoing, intended tosustain or alter the monitored biological condition of a subject.

“Washed-out RA” is a subject diagnosed with RA and having undergone oneor more forms of therapeutic treatment, whereby the therapeutic isdiscontinued for a specified period of time based upon thepharmacokinetic properties of the therapeutic treatment administered,and whereby the specified period of time comprises 1 week, 2 weeks, 3weeks, 1 month, 2 months, or up to 3 or more months.

The PCT patent application publication number WO 01/25473, publishedApr. 12, 2001, entitled “Systems and Methods for Characterizing aBiological Condition or Agent Using Calibrated Gene ExpressionProfiles,” filed for an invention by inventors herein, and which isherein incorporated by reference, discloses the use of Gene ExpressionPanels for the evaluation of (i) biological condition (including withrespect to health and disease) and (ii) the effect of one or more agentson biological condition (including with respect to health, toxicity,therapeutic treatment and drug interaction).

In particular, Gene Expression Panels may be used for measurement oftherapeutic efficacy of natural or synthetic compositions or stimulithat may be formulated individually or in combinations or mixtures for arange of targeted biological conditions; prediction of toxicologicaleffects and dose effectiveness of a composition or mixture ofcompositions for an individual or for a population or set of individualsor for a population of cells; determination of how two or more differentagents administered in a single treatment might interact so as to detectany of synergistic, additive, negative, neutral or toxic activity;performing pre-clinical and clinical trials by providing new criteriafor pre-selecting subjects according to informative profile data setsfor revealing disease status; and conducting preliminary dosage studiesfor these patients prior to conducting phase 1 or 2 trials. These GeneExpression Panels may be employed with respect to samples derived fromsubjects in order to evaluate their biological condition.

The present invention provides Gene Expression Panels for the evaluationor characterization of rheumatoid arthritis and inflammatory conditionrelated to rheumatoid arthritis in a subject. In addition, the GeneExpression Profiles described herein also provided the evaluation of theeffect of one or more agents for the treatment of rheumatoid arthritisand inflammatory condition related to rheumatoid arthritis.

This Gene expression panel is referred to herein as “RA ExpressionPanel”. A RA Expression panel includes one or more genes, e.g.,constituent, listed in Tables 1-2, and Tables 4-10. Each gene of the RAExpression panel is referred to herein as an RA associated gene or an RAassociated constituent.

By evaluating or characterizing rheumatoid arthritis is meant diagnosingrheumatoid arthritis, assessing the risk of developing rheumatoidarthritis or assessing the prognosis of a subject with rheumatoidarthritis. Similarly, by evaluating or characterizing an agent fortreatment of rheumatoid arthritis is meant identifying agents suitablefor the treatment of rheumatoid arthritis. The agents can be compoundsknown to treat RA or compounds that have not been shown to treat RA.

Rheumatoid arthritis and inflammatory condition related to rheumatoidarthritis is evaluated by determining the level of expression (e.g., aquantitative measure) of one or more RA genes. The level of expressionis determined by any means known in the art, such as for examplequantitative PCR. The measurement is obtained under conditions that aresubstantially repeatable. Optionally, the qualitative measure of theconstituent is compared to a baseline level (e.g. baseline profile set).A baseline level is a level expression of the constituent in one or moresubjects known not to be suffering from rheumatoid arthritis (e.g., ahealthy individual). Alternatively, the baseline level is derived fromone or more subjects known to be suffering from rheumatoid arthritis.Optionally, the baseline level is derived from the same subject fromwhich the first measure is derived. For example, the baseline is takenfrom a subject prior to receiving treatment for RA, or at different timeperiods during a course of treatment. Such methods allow for theevaluation of a particular treatment for a selected individual.Comparison can be performed on test (e.g., patient) and referencesamples (e.g., baseline) measured concurrently or at temporally distincttimes. An example of the latter is the use of compiled expressioninformation, e.g., a sequence database, which assembles informationabout expression levels RA genes.

A change in the expression pattern in the patient-derived sample of a RAgene compared to the normal baseline level indicates that the subject issuffering from or is at risk of developing rheumatoid arthritis. Incontrast, when the methods are applied prophylacticly, a similar levelcompared to the normal control level in the patient-derived sample of aRA gene indicates that the subject is not suffering from or is at riskof developing rheumatoid arthritis. Whereas, a similarity in theexpression pattern in the patient-derived sample of a RA gene comparedto the RA baseline level indicates that the subject is suffering from oris at risk of developing rheumatoid arthritis.

Expression of an effective amount of an RA gene also allows for thecourse of treatment of rheumatoid arthritis to be monitored. In thismethod, a biological sample is provided from a subject undergoingtreatment, e.g., if desired, biological samples are obtained from thesubject at various time points before, during, or after treatment.Expression of an effective amount of RA gene is then determined andcompared to baseline profile. The baseline profile may be taken orderived from one or more individuals who have been exposed to thetreatment. Alternatively, the baseline level may be taken or derivedfrom one or more individuals who have not been exposed to the treatment.For example, samples may be collected from subjects who have receivedinitial treatment for rheumatoid arthritis and subsequent treatment forrheumatoid arthritis to monitor the progress of the treatment.

Differences in the genetic makeup of individuals can result indifferences in their relative abilities to metabolize various drugs.Accordingly, the RA Geen expression panels disclosed herein allow for aputative therapeutic or prophylactic to be tested from a selectedsubject in order to determine if the agent is a suitable for treating orpreventing rheumatoid arthritis in the subject.

To identify a therapeutic that is appropriate for a specific subject, atest sample from the subject is exposed to a candidate therapeuticagent, and the expression of one or more of RA genes is determined. Asubject sample is incubated in the presence of a candidate agent and thepattern of RA gene expression in the test sample is measured andcompared to a baseline profile, e.g., a rheumatoid arthritis baselineprofile or an non-rheumatoid arthritis baseline profile or an indexvalue. The test agent can be any compound or composition.

If the reference sample, e.g., baseline is from a subject that does nothave rheumatoid arthritis a similarity in the pattern of RA genes in thetest sample and the reference sample indicates that the treatment isefficacious. However, a change in the pattern of RA genes in the testsample and the reference sample indicates a less favorable clinicaloutcome or prognosis.

By “efficacious” is meant that the treatment leads to a decrease of asign or symptom of rheumatoid arthritis in the subject. Assessmentrheumatoid arthritis is made using standard clinical protocols. Efficacyis determined in association with any known method for diagnosing ortreating rheumatoid arthritis.

A Gene Expression Panel is selected in a manner so that quantitativemeasurement of RNA or protein constituents in the Panel constitutes ameasurement of a biological condition of a subject. In one kind ofarrangement, a calibrated profile data set is employed. Each member ofthe calibrated profile data set is a function of (i) a measure of adistinct constituent of a Gene Expression Panel and (ii) a baselinequantity.

It has been discovered that valuable and unexpected results may beachieved when the quantitative measurement of constituents is performedunder repeatable conditions (within a degree of repeatability ofmeasurement of better than twenty percent, and preferably ten percent orbetter, more preferably five percent or better, and more preferablythree percent or better). For the purposes of this description and thefollowing claims, a degree of repeatability of measurement of betterthan twenty percent as providing measurement conditions that are“substantially repeatable”. In particular, it is desirable that, eachtime a measurement is obtained corresponding to the level of expressionof a constituent in a particular sample, substantially the samemeasurement should result for substantially the same level ofexpression. In this manner, expression levels for a constituent in aGene Expression Panel may be meaningfully compared from sample tosample. Even if the expression level measurements for a particularconstituent are inaccurate (for example, say, 30% too low), thecriterion of repeatability means that all measurements for thisconstituent, if skewed, will nevertheless be skewed systematically, andtherefore measurements of expression level of the constituent may becompared meaningfully. In this fashion valuable information may beobtained and compared concerning expression of the constituent undervaried circumstances.

In addition to the criterion of repeatability, it is desirable that asecond criterion also be satisfied, namely that quantitative measurementof constituents is performed under conditions wherein efficiencies ofamplification for all constituents are substantially similar as definedherein. When both of these criteria are satisfied, then measurement ofthe expression level of one constituent may be meaningfully comparedwith measurement of the expression level of another constituent in agiven sample and from sample to sample.

Additional embodiments relate to the use of an index or algorithmresulting from quantitative measurement of constituents, and optionallyin addition, derived from either expert analysis or computationalbiology (a) in the analysis of complex data sets; (b) to control ornormalize the influence of uninformative or otherwise minor variances ingene expression values between samples or subjects; (c) to simplify thecharacterization of a complex data set for comparison to other complexdata sets, databases or indices or algorithms derived from complex datasets; (d) to monitor a biological condition of a subject; (e) formeasurement of therapeutic efficacy of natural or synthetic compositionsor stimuli that may be formulated individually or in combinations ormixtures for a range of targeted biological conditions; (f) forpredictions of toxicological effects and dose effectiveness of acomposition or mixture of compositions for an individual or for apopulation or set of individuals or for a population of cells; (g) fordetermination of how two or more different agents administered in asingle treatment might interact so as to detect any of synergistic,additive, negative, neutral of toxic activity (h) for performingpre-clinical and clinical trials by providing new criteria forpre-selecting subjects according to informative profile data sets forrevealing disease status and conducting preliminary dosage studies forthese patients prior to conducting phase 1 or 2 trials.

Gene expression profiling and the use of index characterization for aparticular condition or agent or both may be used to reduce the cost ofphase 3 clinical trials and may be used beyond phase 3 trials; labelingfor approved drugs; selection of suitable medication in a class ofmedications for a particular patient that is directed to their uniquephysiology; diagnosing or determining a prognosis of a medical conditionor an infection which may precede onset of symptoms or alternativelydiagnosing adverse side effects associated with administration of atherapeutic agent; managing the health care of a patient; and qualitycontrol for different batches of an agent or a mixture of agents.

The Subject

The methods disclosed here may be applied to cells of humans, mammals orother organisms without the need for undue experimentation by one ofordinary skill in the art because all cells transcribe RNA and it isknown in the art how to extract RNA from all types of cells.

A subject can include those who have not been previously diagnosed ashaving rheumatoid arthritis or an inflammatory condition related torheumatoid arthritis. Alternatively, a subject can also include thosewho have already been diagnosed as having rheumatoid arthritis or aninflammatory condition related to rheumatoid arthritis. Diagnosis of RAis made, for example, from any one or combination of the followingprocedures: a medical history; a physical exam looking for commonfeatures reported in RA, including but not limited to joint swelling,joint tenderness, loss of motion in the joints, and joint malalignment;signs of RA in other organs, including but not limited to skin, lungsand eyes; lab tests including but limited to measuring blood cell count,erythrocyte sedimentation rate, C-Reactive Protein levels, RheumatoidFactor, and/or Antinuclear Antibodies; and imaging studies including butnot limited to X-rays, MRI, joint ultrasound, and/or bone densitometry.

Optionally, the subject has been previously treated with therapeuticagents, or with other therapies and treatment regimens for rheumatoidarthritis or an inflammatory condition related to rheumatoid arthritis,including but not limited to any one or combination of the followingtherapeutics: NSAIDS, e.g., ibuprofen; COX-2 inhibitors, e.g., celecoxib(Celebrex); low dose corticosteroids; DMARDs, e.g., methotrexate,leflunomide, gold thiomalate, aurothioglucose, or auranofin; TNFinhibitors, e.g., etanercept (Enbrel), infliximab (Remicade) andadalimumab (Humira); interleukin-1 inhibitors, e.g., injectible anakinra(Kineret); and other biologic response modifiers.

A subject can also include those who are suffering from, or at risk ofdeveloping rheumatoid arthritis or an inflammatory condition related torheumatoid arthritis, such as those who exhibit known risk factors forrheumatoid arthritis or an inflammatory condition related to rheumatoidarthritis. Known risk factors for RA include, but are not limited toblood transfusions, age (increased susceptibility between ages 25-55),gender (women are 2.5 to 3 times more likely to develop RA than men),family history of RA or other autoimmune disorders, ethnic background(increased susceptibility in Caucasians and Native Americans), andobesity. Subjects suffering from or at risk of developing RA oftenexhibit the gradual worsening of symptoms which include but are notlimited to fatigue, morning stiffness (lasting more than one hour),diffuse muscular aches, loss of appetite, and weakness. Eventually,joint pain appears, with warmth, swelling, tenderness, and stiffness ofthe joint after inactivity.

Selecting Constituents of a Gene Expression Panel

The general approach to selecting constituents of a Gene ExpressionPanel has been described in PCT application publication number WO01/25473, incorporated herein in its entirety. A wide range of GeneExpression Panels have been designed and experimentally verified, eachpanel providing a quantitative measure of biological condition that isderived from a sample of blood or other tissue. For each panel,experiments have verified that a Gene Expression Profile using thepanel's constituents is informative of a biological condition. (It hasalso been demonstrated that in being informative of biologicalcondition, the Gene Expression Profile is used, among other things, tomeasure the effectiveness of therapy, as well as to provide a target fortherapeutic intervention.).

Tables 1-2 and Tables 4-10 listed below, include relevant genes whichmay be selected for a given Gene Expression Panel, such as the GeneExpression Panels demonstrated herein to be useful in the evaluation ofrheumatoid arthritis and inflammatory condition related to rheumatoidarthritis. Table 1 is a panel of 83 genes whose expression is associatedwith Rheumatoid Arthritis or inflammatory conditions related toRheumatoid Arthritis. Table 2 is a panel of 103 genes whose expressionis associated with Inflammation.

Tables 4-6 were derived from a longitudinal study of RA patients afterinitiating Interleukin-1 receptor antagonist (IL1ra) or IL1ra plussoluble TNF-α receptor 1 (sTNFR1) therapy, described in Example 5 below.Table 4 is a panel of genes derived from latent class modeling of thesubjects from this study using a 1-gene model to distinguish betweensubjects suffering from RA and normal subjects. Tables 5-6 are panels ofgene models derived from latent class modeling of the subjects from thisstudy using a 2-gene and 3-gene model respectively, to distinguishbetween subjects suffering from RA and normal subjects. Constituentmodels selected from Tables 5 and 6 are capable of correctly classifyingRA-afflicted and/or normal subjects with at least 75% accuracy. Forexample, in Table 5, it can be seen that the two gene model TLR2 and CD4correctly classifies RA-afflicted subjects with 91% accuracy, and normalsubjects with 98% accuracy. The 2-gene model TLR2 and C1QA correctlyclassifies RA-afflicted subjects with 100% accuracy, and normal subjectswith 96% accuracy. In Table 6, the 3-gene model TLR2, CD4, and NFKB1correctly classifies both RA-afflicted subjects and normal subjects with100% accuracy. The 3-gene model TLR2, CD4, and MYC correctly classifiesRA subjects with 96% accuracy and normal subjects with 99% accuracy.

Tables 7-10 were derived from a longitudinal study of RA patients afterinitiating NSAID, MTX or new TNF-inhibitor therapy, described in Example6 below. Table 7 is a panel of genes derived from latent class modelingof the subjects from this study using a 1-gene model to distinguishbetween subjects suffering from RA and normal subjects. Tables 8-10 arepanels of gene models derived from latent class modeling of the subjectsfrom this study using a 2-gene, 3-gene, and 4-gene model respectively,to distinguish between subjects suffering from RA and normal subjects.Constituent models selected from Tables 8-10 are capable of correctlyclassifying RA-afflicted and/or normal subjects with at least 75%accuracy. For example, in Table 8, the two gene model ICAM1 and HLADRAcorrectly classifies RA-afflicted subjects with 90% accuracy, and normalsubjects with 91% accuracy. In Table 9, the 3-gene model ICAM1, HLADRA,and HSPA1A correctly classifies RA-afflicted subjects with 95% accuracy,and normal subjects with 97% accuracy. In Table 10, the 4-gene modelICAM1, HLADRA, HSPA1A, and TGFB1 correctly classifies both RA-afflictedand normal subjects with 100% accuracy.

In general, panels may be constructed and experimentally verified by oneof ordinary skill in the art in accordance with the principlesarticulated in the present application.

Design of Assays

Typically, a sample is run through a panel in triplicate; that is, asample is divided into aliquots and for each aliquot the concentrationsof each constituent in a Gene Expression Panel is measured. From over atotal of 900 constituent assays, with each assay conducted intriplicate, an average coefficient of variation was found (standarddeviation/average)*100, of less than 2 percent among the normalized ΔCtmeasurements for each assay (where normalized quantitation of the targetmRNA is determined by the difference in threshold cycles between theinternal control (e.g., an endogenous marker such as 18S rRNA, or anexogenous marker) and the gene of interest. This figure is a measurecalled “intra-assay variability”. Assays have also been conducted ondifferent occasions using the same sample material. With 72 assays,resulting from concentration measurements of constituents in a panel of24 members, and such concentration measurements determined on threedifferent occasions over time, an average coefficient of variation ofless than 5 percent, typically less than 2%, was found. This is ameasure of “inter-assay variability”. Preferably, the averagecoefficient of variation is less than 20%, more preferably less than10%, more preferably less than 5%, more preferably less than 4%, morepreferably less than 3%, more preferably less than 2%, and even morepreferably less than 1%.

It has been determined that it is valuable to use the quadruplicate ortriplicate test results to identify and eliminate data points that arestatistical “outliers”; such data points are those that differ by apercentage greater, for example, than 3% of the average of all three orfour values. Moreover, if more than one data point in a set of three orfour is excluded by this procedure, then all data for the relevantconstituent is discarded.

Measurement of Gene Expression for a Constituent in the Panel

For measuring the amount of a particular RNA in a sample, methods knownto one of ordinary skill in the art were used to extract and quantifytranscribed RNA from a sample with respect to a constituent of a GeneExpression Panel. (See detailed protocols below. Also see PCTapplication publication number WO 98/24935 herein incorporated byreference for RNA analysis protocols). Briefly, RNA is extracted from asample such as a tissue, body fluid, or culture medium in which apopulation of cells of a subject might be growing. For example, cellsmay be lysed and RNA eluted in a suitable solution in which to conduct aDNAse reaction. First strand synthesis may be performed using a reversetranscriptase. Gene amplification, more specifically quantitative PCRassays, can then be conducted and the gene of interest calibratedagainst an internal marker such as 18S rRNA (Hirayama et al., Blood 92,1998: 46-52). Any other endogenous marker can be used, such as 28S-25SrRNA and 5S rRNA. Samples are measured in multiple replicates, forexample, 3 replicates. In an embodiment of the invention, quantitativePCR is performed using amplification, reporting agents and instrumentssuch as those supplied commercially by Applied Biosystems (Foster City,Calif.). Given a defined efficiency of amplification of targettranscripts, the point (e.g., cycle number) that signal from amplifiedtarget template is detectable may be directly related to the amount ofspecific message transcript in the measured sample. Similarly, otherquantifiable signals such as fluorescence, enzyme activity,disintegrations per minute, absorbance, etc., when correlated to a knownconcentration of target templates (e.g., a reference standard curve) ornormalized to a standard with limited variability can be used toquantify the number of target templates in an unknown sample.

Although not limited to amplification methods, quantitative geneexpression techniques may utilize amplification of the targettranscript. Alternatively or in combination with amplification of thetarget transcript, quantitation of the reporter signal for an internalmarker generated by the exponential increase of amplified product mayalso be used. Amplification of the target template may be accomplishedby isothermic gene amplification strategies, or by gene amplification bythermal cycling such as PCR.

It is desirable to obtain a definable and reproducible correlationbetween the amplified target or reporter signal, i.e., internal marker,and the concentration of starting templates. It has been discovered thatthis objective can be achieved by careful attention to, for example,consistent primer-template ratios and a strict adherence to a narrowpermissible level of experimental amplification efficiencies (forexample 90.0 to 100%+/−5% relative efficiency, typically 99.8 to 100%relative efficiency). For example, in determining gene expression levelswith regard to a single Gene Expression Profile, it is necessary thatall constituents of the panels, including endogenous controls, maintainsimilar amplification efficiencies, as defined herein, to permitaccurate and precise relative measurements for each constituent.Amplification efficiencies are regarded as being “substantiallysimilar”, for the purposes of this description and the following claims,if they differ by no more than approximately 10%, preferably by lessthan approximately 5%, more preferably by less than approximately 3%,and more preferably by less than approximately 1%. Measurementconditions are regarded as being “substantially repeatable, for thepurposes of this description and the following claims, if they differ byno more than approximately +/−10% coefficient of variation (CV),preferably by less than approximately +/−5% CV, more preferably +/−2%CV. These constraints should be observed over the entire range ofconcentration levels to be measured associated with the relevantbiological condition. While it is thus necessary for various embodimentsherein to satisfy criteria that measurements are achieved undermeasurement conditions that are substantially repeatable and whereinspecificity and efficiencies of amplification for all constituents aresubstantially similar, nevertheless, it is within the scope of thepresent invention as claimed herein to achieve such measurementconditions by adjusting assay results that do not satisfy these criteriadirectly, in such a manner as to compensate for errors, so that thecriteria are satisfied after suitable adjustment of assay results.

In practice, tests are run to assure that these conditions aresatisfied. For example, the design of all primer-probe sets are done inhouse, experimentation is performed to determine which set gives thebest performance. Even though primer-probe design can be enhanced usingcomputer techniques known in the art, and notwithstanding commonpractice, it has been found that experimental validation is stilluseful. Moreover, in the course of experimental validation, the selectedprimer-probe combination is associated with a set of features:

The reverse primer should be complementary to the coding DNA strand. Inone embodiment, the primer should be located across an intron-exonjunction, with not more than four bases of the three-prime end of thereverse primer complementary to the proximal exon. (If more than fourbases are complementary, then it would tend to competitively amplifygenomic DNA.)

In an embodiment of the invention, the primer probe set should amplifycDNA of less than 110 bases in length and should not amplify, orgenerate fluorescent signal from, genomic DNA or transcripts or cDNAfrom related but biologically irrelevant loci.

A suitable target of the selected primer probe is first strand cDNA,which may be prepared, in one embodiment, is described as follows:

(a) Use of Whole Blood for Ex Vivo Assessment of a Biological ConditionAffected by an Agent.

Human blood is obtained by venipuncture and prepared for assay byseparating samples for baseline, no exogenous stimulus, andpro-inflammatory stimulus with sufficient volume for at least three timepoints. Typical pro-inflammatory stimuli include lipopolysaccharide(LPS), phytohemagglutinin (PHA) heat-killed staphylococci (HKS),carrageean, IL-2 plus toxic shock syndrome toxin-1 (TSST1), or cytokinecocktails, and may be used individually or in combination. The aliquotsof heparinized, whole blood are mixed with additional test therapeuticcompounds and held at 37° C. in an atmosphere of 5% CO₂ for 30 minutes.Stimulus is added at varying concentrations, mixed and held looselycapped at 37° C. for the prescribed timecourse. At defined times, cellsare lysede and RNA extracted by various standard means.

Nucleic acids, RNA and or DNA are purified from cells, tissues or fluidsof the test population of cells or indicator cell lines. RNA ispreferentially obtained from the nucleic acid mix using a variety ofstandard procedures (or RNA Isolation Strategies, pp. 55-104, in RNAMethodologies, A laboratory guide for isolation and characterization,2nd edition, 1998, Robert E. Farrell, Jr., Ed., Academic Press), in thepresent using a filter-based RNA isolation system from Ambion(RNAqueous™, Phenol-free Total RNA Isolation Kit, Catalog #1912, version9908; Austin, Tex.).

In accordance with one procedure, the whole blood assay for GeneExpression Profiles determination was carried out as follows: Humanwhole blood was drawn into 10 mL Vacutainer tubes with Sodium Heparin.Blood samples were mixed by gently inverting tubes 4-5 times. The bloodwas used within 10-15 minutes of draw. In the experiments, blood wasdiluted 2-fold, i.e. per sample per time point, 0.6 mL whole blood+0.6mL stimulus. The assay medium was prepared and the stimulus added asappropriate.

A quantity (0.6 mL) of whole blood was then added into each 12×75 mmpolypropylene tube. 0.6 mL of 2× LPS (from E. coli serotype 0127:B8,Sigma#L3880 or serotype 055, Sigma #L4005, 10 ng/mL, subject to changein different lots) into LPS tubes was added. Next, 0.6 mL assay mediumwas added to the “control” tubes. The caps were closed tightly. Thetubes were inverted 2-3 times to mix samples. Caps were loosened tofirst stop and the tubes incubated at 37° C., 5% CO₂ for 6 hours. At 6hours, samples were gently mixed to resuspend blood cells, and 0.15 mLwas removed from each tube (using a micropipettor with barrier tip), andtransferred to 0.15 mL of lysis buffer and mixed. Lysed samples wereextracted using an ABI 6100 Nucleic Acid Prepstation following themanufacturer's recommended protocol.

The samples were then centrifuged for 5 min at 500× g, ambienttemperature (IEC centrifuge or equivalent, in microfuge tube adapters inswinging bucket), and as much serum from each tube was removed aspossible and discarded. Cell pellets were placed on ice; and RNAextracted as soon as possible using an Ambion RNAqueous kit.

(b) Amplification Strategies.

Specific RNAs are amplified using message specific primers or randomprimers. The specific primers are synthesized from data obtained frompublic databases (e.g., Unigene, National Center for BiotechnologyInformation, National Library of Medicine, Bethesda, Md.), includinginformation from genomic and cDNA libraries obtained from humans andother animals. Primers are chosen to preferentially amplify fromspecific RNAs obtained from the test or indicator samples (see, forexample, RT PCR, Chapter 15 in RNA Methodologies, A laboratory guide forisolation and characterization, 2nd edition, 1998, Robert E. Farrell,Jr., Ed., Academic Press; or Chapter 22 pp. 143-151, RNA isolation andcharacterization protocols, Methods in molecular biology, Volume 86,1998, R. Rapley and D. L. Manning Eds., Human Press, or 14 inStatistical refinement of primer design parameters, Chapter 5, pp.55-72, PCR applications: protocols for functional genomics, M. A. Innis,D. H. Gelfand and J. J. Sninsky, Eds., 1999, Academic Press).Amplifications are carried out in either isothermic conditions or usinga thermal cycler (for example, a ABI 9600 or 9700 or 7900 obtained fromApplied Biosystems, Foster City, Calif.; see Nucleic acid detectionmethods, pp. 1-24, in Molecular methods for virus detection, D. L.Wiedbrauk and D. H., Farkas, Eds., 1995, Academic Press). Amplifiednucleic acids are detected using fluorescent-tagged detectionoligonucleotide probes (see, for example, Taqman™ PCR Reagent Kit,Protocol, part number 402823 revision A, 1996, Applied Biosystems,Foster City Calif.) that are identified and synthesized from publiclyknown databases as described for the amplification primers. In thepresent case, amplified cDNA is detected and quantified using the ABIPrism 7900 Sequence Detection System obtained from Applied Biosystems(Foster City, Calif.). Amounts of specific RNAs contained in the testsample or obtained from the indicator cell lines can be related to therelative quantity of fluorescence observed (see for example, Advances inquantitative PCR technology: 5′ nuclease assays, Y. S. Lie and C. J.Petropolus, Current Opinion in Biotechnology, 1998, 9:43-48, or Rapidthermal cycling and PCR kinetics, pp. 211-229, chapter 14 in PCRapplications: protocols for functional genomics, M. A. Innis, D. H.Gelfand and J. J. Sninsky, Eds., 1999, Academic Press).

As a particular implementation of the approach described here in detailis a procedure for synthesis of first strand cDNA for use in PCR. Thisprocedure can be used for both whole blood RNA and RNA extracted fromcultured cells (i.e. THP-1 cells).

Materials

1. Applied Biosystems TAQMAN Reverse Transcription Reagents Kit (P/N808-0234). Kit Components: 10× TaqMan RT Buffer, 25 mM Magnesiumchloride, deoxyNTPs mixture, Random Hexamers, RNase Inhibitor,MultiScribe Reverse Transcriptase (50 U/mL) (2) RNase/DNase free water(DEPC Treated Water from Ambion (P/N 9915G), or equivalent)

Methods

1. Place RNase Inhibitor and MultiScribe Reverse Transcriptase on iceimmediately. All other reagents can be thawed at room temperature andthen placed on ice.

2. Remove RNA samples from −80° C. freezer and thaw at room temperatureand then place immediately on ice.

3. Prepare the following cocktail of Reverse Transcriptase Reagents foreach 100 mL RT reaction (for multiple samples, prepare extra cocktail toallow for pipetting error):

1 reaction (mL) 11X, e.g. 10 samples (μL) 10X RT Buffer 10.0 110.0 25 mMMgCl₂ 22.0 242.0 dNTPs 20.0 220.0 Random Hexamers 5.0 55.0 RNAseInhibitor 2.0 22.0 Reverse Transcriptase 2.5 27.5 Water 18.5 203.5Total: 80.0 880.0 (80 μL per sample)

4. Bring each RNA sample to a total volume of 20 μL in a 1.5 mLmicrocentrifuge tube (for example, for THP-1 RNA, remove 10 μL RNA anddilute to 20 μL with RNase/DNase free water, for whole blood RNA use 20μL total RNA) and add 80 μL RT reaction mix from step 5, 2, 3. Mix bypipetting up and down.

5. Incubate sample at room temperature for 10 minutes.

6. Incubate sample at 37° C. for 1 hour.

7. Incubate sample at 90° C. for 10 minutes.

8. Quick spin samples in microcentrifuge.

9. Place sample on ice if doing PCR immediately, otherwise store sampleat −20° C. for future use.

10. PCR QC should be run on all RT samples using 18S and b-actin (seeSOP 200-020).

The use of the primer probe with the first strand cDNA as describedabove to permit measurement of constituents of a Gene Expression Panelis as follows:

Set up of a 24-gene Human Gene Expression Panel for Inflammation.

Materials

1. 20× Primer/Probe Mix for each gene of interest.

2. 20× Primer/Probe Mix for 18S endogenous control.

3. 2× Taqman Universal PCR Master Mix.

4. cDNA transcribed from RNA extracted from cells.

5. Applied Biosystems 96-Well Optical Reaction Plates.

6. Applied Biosystems Optical Caps, or optical-clear film.

7. Applied Biosystem Prism 7700 or 7900 Sequence Detector.

Methods

1. Make stocks of each Primer/Probe mix containing the Primer/Probe forthe gene of interest, Primer/Probe for 18S endogenous control, and 2×PCR Master Mix as follows. Make sufficient excess to allow for pipettingerror e.g., approximately 10% excess. The following example illustratesa typical set up for one gene with quadruplicate samples testing twoconditions (2 plates).

1X (1 well) (μL)  2X Master Mix 7.5 20X 18S Primer/Probe Mix 0.75 20XGene of interest Primer/Probe Mix 0.75 Total 9.0

2. Make stocks of cDNA targets by diluting 95 μL of cDNA into 2000 μL ofwater. The amount of cDNA is adjusted to give Ct values between 10 and18, typically between 12 and 16.

3. Pipette 9 μL of Primer/Probe mix into the appropriate wells of anApplied Biosystems 384-Well Optical Reaction Plate.

4. Pipette 10 μL of cDNA stock solution into each well of the AppliedBiosystems 384-Well Optical Reaction Plate.

5. Seal the plate with Applied Biosystems Optical Caps, or optical-clearfilm.

6. Analyze the plate on the ABI Prism 7900 Sequence Detector.

Methods herein may also be applied using proteins where sensitivequantitative techniques, such as an Enzyme Linked ImmunoSorbent Assay(ELISA) or mass spectroscopy, are available and well-known in the artfor measuring the amount of a protein constituent. (see WO 98/24935herein incorporated by reference).

Baseline Profile Data Sets

The analyses of samples from single individuals and from large groups ofindividuals provide a library of profile data sets relating to aparticular panel or series of panels. These profile data sets may bestored as records in a library for use as baseline profile data sets. Asthe term “baseline” suggests, the stored baseline profile data setsserve as comparators for providing a calibrated profile data set that isinformative about a biological condition or agent. Baseline profile datasets may be stored in libraries and classified in a number ofcross-referential ways. One form of classification may rely on thecharacteristics of the panels from which the data sets are derived.Another form of classification may be by particular biologicalcondition, e.g., rheumatoid arthritis. The concept of biologicalcondition encompasses any state in which a cell or population of cellsmay be found at any one time. This state may reflect geography ofsamples, sex of subjects or any other discriminator. Some of thediscriminators may overlap. The libraries may also be accessed forrecords associated with a single subject or particular clinical trial.The classification of baseline profile data sets may further beannotated with medical information about a particular subject, a medicalcondition, and/or a particular agent.

The choice of a baseline profile data set for creating a calibratedprofile data set is related to the biological condition to be evaluated,monitored, or predicted, as well as, the intended use of the calibratedpanel, e.g., as to monitor drug development, quality control or otheruses. It may be desirable to access baseline profile data sets from thesame subject for whom a first profile data set is obtained or fromdifferent subject at varying times, exposures to stimuli, drugs orcomplex compounds; or may be derived from like or dissimilar populationsor sets of subjects. The baseline profile data set may be normal,healthy baseline.

The profile data set may arise from the same subject for which the firstdata set is obtained, where the sample is taken at a separate or similartime, a different or similar site or in a different or similarbiological condition. For example, a sample may be taken beforestimulation or after stimulation with an exogenous compound orsubstance, such as before or after therapeutic treatment. The profiledata set obtained from the unstimulated sample may serve as a baselineprofile data set for the sample taken after stimulation. The baselinedata set may also be derived from a library containing profile data setsof a population or set of subjects having some defining characteristicor biological condition. The baseline profile data set may alsocorrespond to some ex vivo or in vitro properties associated with an invitro cell culture. The resultant calibrated profile data sets may thenbe stored as a record in a database or library along with or separatefrom the baseline profile data base and optionally the first profiledata set although the first profile data set would normally becomeincorporated into a baseline profile data set under suitableclassification criteria. The remarkable consistency of Gene ExpressionProfiles associated with a given biological condition makes it valuableto store profile data, which can be used, among other things fornormative reference purposes. The normative reference can serve toindicate the degree to which a subject conforms to a given biologicalcondition (healthy or diseased) and, alternatively or in addition, toprovide a target for clinical intervention.

Selected baseline profile data sets may be also be used as a standard bywhich to judge manufacturing lots in terms of efficacy, toxicity, etc.Where the effect of a therapeutic agent is being measured, the baselinedata set may correspond to Gene Expression Profiles taken beforeadministration of the agent. Where quality control for a newlymanufactured product is being determined, the baseline data set maycorrespond with a gold standard for that product. However, any suitablenormalization techniques may be employed. For example, an averagebaseline profile data set is obtained from authentic material of anaturally grown herbal nutriceutical and compared over time and overdifferent lots in order to demonstrate consistency, or lack ofconsistency, in lots of compounds prepared for release.

Calibrated Data

Given the repeatability achieved in measurement of gene expression,described above in connection with “Gene Expression Panels” and “geneamplification”, it was concluded that where differences occur inmeasurement under such conditions, the differences are attributable todifferences in biological condition. Thus, it has been found thatcalibrated profile data sets are highly reproducible in samples takenfrom the same individual under the same conditions. Similarly, it hasbeen found that calibrated profile data sets are reproducible in samplesthat are repeatedly tested. Also found have been repeated instanceswherein calibrated profile data sets obtained when samples from asubject are exposed ex vivo to a compound are comparable to calibratedprofile data from a sample that has been exposed to a sample in vivo.Importantly, it has been determined that an indicator cell line treatedwith an agent can in many cases provide calibrated profile data setscomparable to those obtained from in vivo or ex vivo populations ofcells. Moreover, it has been determined that administering a sample froma subject onto indicator cells can provide informative calibratedprofile data sets with respect to the biological condition of thesubject including the health, disease states, therapeutic interventions,aging or exposure to environmental stimuli or toxins of the subject.

Calculation of Calibrated Profile Data Sets and Computational Aids

The calibrated profile data set may be expressed in a spreadsheet orrepresented graphically for example, in a bar chart or tabular form butmay also be expressed in a three dimensional representation. Thefunction relating the baseline and profile data may be a ratio expressedas a logarithm. The constituent may be itemized on the x-axis and thelogarithmic scale may be on the y-axis. Members of a calibrated data setmay be expressed as a positive value representing a relative enhancementof gene expression or as a negative value representing a relativereduction in gene expression with respect to the baseline.

Each member of the calibrated profile data set should be reproduciblewithin a range with respect to similar samples taken from the subjectunder similar conditions. For example, the calibrated profile data setsmay be reproducible within one order of magnitude with respect tosimilar samples taken from the subject under similar conditions. Moreparticularly, the members may be reproducible within 20%, and typicallywithin 10%. In accordance with embodiments of the invention, a patternof increasing, decreasing and no change in relative gene expression fromeach of a plurality of gene loci examined in the Gene Expression Panelmay be used to prepare a calibrated profile set that is informative withregards to a biological condition, biological efficacy of an agenttreatment conditions or for comparison to populations or sets ofsubjects or samples, or for comparison to populations of cells. Patternsof this nature may be used to identify likely candidates for a drugtrial, used alone or in combination with other clinical indicators to bediagnostic or prognostic with respect to a biological condition or maybe used to guide the development of a pharmaceutical or nutriceuticalthrough manufacture, testing and marketing.

The numerical data obtained from quantitative gene expression andnumerical data from calibrated gene expression relative to a baselineprofile data set may be stored in databases or digital storage mediumsand may retrieved for purposes including managing patient health care orfor conducting clinical trials or for characterizing a drug. The datamay be transferred in physical or wireless networks via the World WideWeb, email, or internet access site for example or by hard copy so as tobe collected and pooled from distant geographic sites.

The method also includes producing a calibrated profile data set for thepanel, wherein each member of the calibrated profile data set is afunction of a corresponding member of the first profile data set and acorresponding member of a baseline profile data set for the panel, andwherein the baseline profile data set is related to the rheumatoidarthritis or inflammatory conditions related to rheumatoid arthritis tobe evaluated, with the calibrated profile data set being a comparisonbetween the first profile data set and the baseline profile data set,thereby providing evaluation of the rheumatoid arthritis or inflammatoryconditions related to rheumatoid arthritis of the subject.

In yet other embodiments, the function is a mathematical function and isother than a simple difference, including a second function of the ratioof the corresponding member of first profile data set to thecorresponding member of the baseline profile data set, or a logarithmicfunction. In related embodiments, each member of the calibrated profiledata set has biological significance if it has a value differing by morethan an amount D, where D=F(1.1)−F(0.9), and F is the second function.In such embodiments, the first sample is obtained and the first profiledata set quantified at a first location, and the calibrated profile dataset is produced using a network to access a database stored on a digitalstorage medium in a second location, wherein the database may be updatedto reflect the first profile data set quantified from the sample.Additionally, using a network may include accessing a global computernetwork.

In an embodiment of the present invention, a descriptive record isstored in a single database or multiple databases where the stored dataincludes the raw gene expression data (first profile data set) prior totransformation by use of a baseline profile data set, as well as arecord of the baseline profile data set used to generate the calibratedprofile data set including for example, annotations regarding whetherthe baseline profile data set is derived from a particular SignaturePanel and any other annotation that facilitates interpretation and useof the data.

Because the data is in a universal format, data handling may readily bedone with a computer. The data is organized so as to provide an outputoptionally corresponding to a graphical representation of a calibrateddata set.

For example, a distinct sample derived from a subject being at least oneof RNA or protein may be denoted as PI. The first profile data setderived from sample PI is denoted Mj, where Mj is a quantitative measureof a distinct RNA or protein constituent of PI. The record Ri is a ratioof M and P and may be annotated with additional data on the subjectrelating to, for example, age, diet, ethnicity, gender, geographiclocation, medical disorder, mental disorder, medication, physicalactivity, body mass and environmental exposure. Moreover, data handlingmay further include accessing data from a second condition databasewhich may contain additional medical data not presently held with thecalibrated profile data sets. In this context, data access may be via acomputer network.

The above described data storage on a computer may provide theinformation in a form that can be accessed by a user. Accordingly, theuser may load the information onto a second access site includingdownloading the information. However, access may be restricted to usershaving a password or other security device so as to protect the medicalrecords contained within. A feature of this embodiment of the inventionis the ability of a user to add new or annotated records to the data setso the records become part of the biological information.

The graphical representation of calibrated profile data sets pertainingto a product such as a drug provides an opportunity for standardizing aproduct by means of the calibrated profile, more particularly asignature profile. The profile may be used as a feature with which todemonstrate relative efficacy, differences in mechanisms of actions,etc. compared to other drugs approved for similar or different uses.

The various embodiments of the invention may be also implemented as acomputer program product for use with a computer system. The product mayinclude program code for deriving a first profile data set and forproducing calibrated profiles. Such implementation may include a seriesof computer instructions fixed either on a tangible medium, such as acomputer readable medium (for example, a diskette, CD-ROM, ROM, or fixeddisk), or transmittable to a computer system via a modem or otherinterface device, such as a communications adapter coupled to a network.The network coupling may be for example, over optical or wiredcommunications lines or via wireless techniques (for example, microwave,infrared or other transmission techniques) or some combination of these.The series of computer instructions preferably embodies all or part ofthe functionality previously described herein with respect to thesystem. Those skilled in the art should appreciate that such computerinstructions can be written in a number of programming languages for usewith many computer architectures or operating systems. Furthermore, suchinstructions may be stored in any memory device, such as semiconductor,magnetic, optical or other memory devices, and may be transmitted usingany communications technology, such as optical, infrared, microwave, orother transmission technologies. It is expected that such a computerprogram product may be distributed as a removable medium withaccompanying printed or electronic documentation (for example, shrinkwrapped software), preloaded with a computer system (for example, onsystem ROM or fixed disk), or distributed from a server or electronicbulletin board over a network (for example, the Internet or World WideWeb). In addition, a computer system is further provided includingderivative modules for deriving a first data set and a calibrationprofile data set.

The calibration profile data sets in graphical or tabular form, theassociated databases, and the calculated index or derived algorithm,together with information extracted from the panels, the databases, thedata sets or the indices or algorithms are commodities that can be soldtogether or separately for a variety of purposes as described in WO01/25473.

In other embodiments, a clinical indicator may be used to assess therheumatoid arthritis or inflammatory conditions related to rheumatoidarthritis of the relevant set of subjects by interpreting the calibratedprofile data set in the context of at least one other clinicalindicator, wherein the at least one other clinical indicator is selectedfrom the group consisting of blood chemistry, urinalysis, X-ray or otherradiological or metabolic imaging technique, other chemical assays, andphysical findings.

Index Construction

In combination, (i) the remarkable consistency of Gene ExpressionProfiles with respect to a biological condition across a population orset of subject or samples, or across a population of cells and (ii) theuse of procedures that provide substantially reproducible measurement ofconstituents in a Gene Expression Panel giving rise to a Gene ExpressionProfile, under measurement conditions wherein specificity andefficiencies of amplification for all constituents of the panel aresubstantially similar, make possible the use of an index thatcharacterizes a Gene Expression Profile, and which therefore provides ameasurement of a biological condition.

An index may be constructed using an index function that maps values ina Gene Expression Profile into a single value that is pertinent to thebiological condition at hand. The values in a Gene Expression Profileare the amounts of each constituent of the Gene Expression Panel thatcorresponds to the Gene Expression Profile. These constituent amountsform a profile data set, and the index function generates a singlevalue—the index—from the members of the profile data set.

The index function may conveniently be constructed as a linear sum ofterms, each term being what is referred to herein as a “contributionfunction” of a member of the profile data set. For example, thecontribution function may be a constant times a power of a member of theprofile data set. So the index function would have the formI=ΣCiMiP(i,

where I is the index, Mi is the value of the member i of the profiledata set, Ci is a constant, and P(i) is a power to which Mi is raised,the sum being formed for all integral values of i up to the number ofmembers in the data set. We thus have a linear polynomial expression.

The values Ci and P(i) may be determined in a number of ways, so thatthe index I is informative of the pertinent biological condition. Oneway is to apply statistical techniques, such as latent class modeling,to the profile data sets to correlate clinical data or experimentallyderived data, or other data pertinent to the biological condition. Inthis connection, for example, may be employed the software fromStatistical Innovations, Belmont, Mass., called Latent Gold®.Alternatively, other simpler modeling techniques may be employed in amanner known in the art. The index function for inflammation may beconstructed, for example, in a manner that a greater degree ofinflammation (as determined by the profile data set for the InflammationGene Expression Panel included in Tables 1 and 2) correlates with alarge value of the index function. In a simple embodiment, therefore,each P(i) may be +1 or −1, depending on whether the constituentincreases or decreases with increasing inflammation. As discussed infurther detail below, a meaningful inflammation index that isproportional to the expression, referred to herein as IR-105, wasconstructed as follows:¼{IL1A}+¼{IL1B}+¼{TNF}+¼{INFG}−1/{IL10},

where the braces around a constituent designate measurement of suchconstituent and the constituents are a subset of the Inflammation GeneExpression Panel included in Tables 1 and 2.

Just as a baseline profile data set, discussed above, can be used toprovide an appropriate normative reference, and can even be used tocreate a Calibrated profile data set, as discussed above, based on thenormative reference, an index that characterizes a Gene ExpressionProfile can also be provided with a normative value of the indexfunction used to create the index. This normative value can bedetermined with respect to a relevant population or set of subjects orsamples or to a relevant population of cells, so that the index may beinterpreted in relation to the normative value. The relevant populationor set of subjects or samples, or relevant population of cells may havein common a property that is at least one of age range, gender,ethnicity, geographic location, nutritional history, medical condition,clinical indicator, medication, physical activity, body mass, andenvironmental exposure.

As an example, the index can be constructed, in relation to a normativeGene Expression Profile for a population or set of healthy subjects, insuch a way that a reading of approximately 1 characterizes normativeGene Expression Profiles of healthy subjects. Let us further assume thatthe biological condition that is the subject of the index isinflammation; a reading of 1 in this example thus corresponds to a GeneExpression Profile that matches the norm for healthy subjects. Asubstantially higher reading then may identify a subject experiencing aninflammatory condition. The use of 1 as identifying a normative value,however, is only one possible choice; another logical choice is to use 0as identifying the normative value. With this choice, deviations in theindex from zero can be indicated in standard deviation units (so thatvalues lying between −1 and +1 encompass 90% of a normally distributedreference population or set of subjects. Since it was determined thatGene Expression Profile values (and accordingly constructed indicesbased on them) tend to be normally distributed, the 0-centered indexconstructed in this manner is highly informative. It thereforefacilitates use of the index in diagnosis of disease and settingobjectives for treatment. The choice of 0 for the normative value, andthe use of standard deviation units, for example, are illustrated inFIG. 4B, discussed below.

Still another embodiment is a method of providing an index that isindicative of rheumatoid arthritis or inflammatory conditions related torheumatoid arthritis of a subject based on a first sample from thesubject, the first sample providing a source of RNAs, the methodcomprising deriving from the first sample a profile data set, theprofile data set including a plurality of members, each member being aquantitative measure of the amount of a distinct RNA constituent in apanel of constituents selected so that measurement of the constituentsis indicative of the presumptive signs of rheumatoid arthritis, thepanel including at least two of the constituents of any of the geneslisted in the Inflammation Gene Expression Panel included in Tables 1and 2. In deriving the profile data set, such measure for eachconstituent is achieved under measurement conditions that aresubstantially repeatable, at least one measure from the profile data setis applied to an index function that provides a mapping from at leastone measure of the profile data set into one measure of the presumptivesigns of rheumatoid arthritis, so as to produce an index pertinent tothe rheumatoid arthritis or inflammatory conditions related to therheumatoid arthritis of the subject.

As another embodiment of the invention, an index function I of the formI=C ₀ +ΣC _(i) M _(li) ^(P1(i)) M _(2i) ^(P2(i)),

can be employed, where M₁ and M₂ are values of the member i of theprofile data set, C_(i) is a constant determined without reference tothe profile data set, and P1 and P2 are powers to which M₁ and M₂ areraised. For example, when P1=P2=0, the index function is simply the sumof constants; when P1=1 and P2=0, the index function is a linearexpression; when P1=P2=1, the index function is a quadratic expression.As discussed in further detail below, a quadratic expression that isconstructed as a meaningful identifier of Rheumatoid Arthritis (RA) isthe following:C₀+C₁{TLR2}+C₂{CD4}+C₃{NFKB1}+C₄{TLR2}{CD4}+C₅{TLR2}{NFKB1}+C₆{NFKB1}²+C₇{TLR2}²+C₈{CD4}².

where the constant Co serves to calibrate this expression to thebiological population of interest (such as RA), that is characterized byinflammation. In this embodiment, when the index value equals 0, theodds are 50:50 of I the subject being RA vs normal. More generally, thepredicted odds of being RA is [exp(I_(i))], and therefore the predictedprobability of being RA is [exp(I_(i))]/[I+exp((I_(i))]. Thus, when theindex exceeds 0, the predicted probability that a subject is RA ishigher than 0.5, and when it falls below 0, the predicted probability isless than 0.5.

The value of C₀ may be adjusted to reflect the prior probability ofbeing in this population based on known exogenous risk factors for thesubject. In an embodiment where C₀ is adjusted as a function of thesubject's risk factors, where the subject has prior probability pi ofbeing RA based on such risk factors, the adjustment is made byincreasing (decreasing) the unadjusted C₀ value by adding to C₀ thenatural logarithm of the ratio of the prior odds of being RA taking intoaccount the risk factors to the overall prior odds of being RA withouttaking into account the risk factors.

It was determined that the above quadratic expression for RA may be wellapproximated by a linear expression of the form:D₀+D₁{TLR2}+D₂{CD4}+D₃{NFKB1}.

Kits

The invention also includes an RA-detection reagent, i.e., nucleic acidsthat specifically identify one or more rheumatoid arthritis orinflammatory condition related to rheumatoid arthritis nucleic acids(e.g., any gene listed in Tables 1-2 and Tables 4-10; sometimes referredto herein as RA-associated genes) by having homologous nucleic acidsequences, such as oligonucleotide sequences, complementary to a portionof the RA-associated genes nucleic acids or antibodies to proteinsencoded by the RA-associated genes nucleic acids packaged together inthe form of a kit. The oligonucleotides can be fragments of theRA-associated genes. For example the oligonucleotides can be 200, 150,100, 50, 25, 10 or less nucleotides in length. The kit may contain inseparate containers a nucleic acid or antibody (either already bound toa solid matrix or packaged separately with reagents for binding them tothe matrix), control formulations (positive and/or negative), and/or adetectable label. Instructions (i.e., written, tape, VCR, CD-ROM, etc.)for carrying out the assay may be included in the kit. The assay may forexample be in the form of PCR, a Northern hybridization or a sandwichELISA, as known in the art.

For example, RA-associated genes detection reagents can be immobilizedon a solid matrix such as a porous strip to form at least oneRA-associated genes detection site. The measurement or detection regionof the porous strip may include a plurality of sites containing anucleic acid. A test strip may also contain sites for negative and/orpositive controls. Alternatively, control sites can be located on aseparate strip from the test strip. Optionally, the different detectionsites may contain different amounts of immobilized nucleic acids, i.e.,a higher amount in the first detection site and lesser amounts insubsequent sites. Upon the addition of test sample, the number of sitesdisplaying a detectable signal provides a quantitative indication of theamount of RA-associated genes present in the sample. The detection sitesmay be configured in any suitably detectable shape and are typically inthe shape of a bar or dot spanning the width of a test strip.

Alternatively, the kit contains a nucleic acid substrate arraycomprising one or more nucleic acid sequences. The nucleic acids on thearray specifically identify one or more nucleic acid sequencesrepresented by RA-associated genes (see Tables 1-2 and Tables 4-10). Invarious embodiments, the expression of 2, 3, 4, 5, 6, 7, 8, 9, 10, 15,20, 25, 40 or 50 or more of the sequences represented by RA-associatedgenes (see Tables 1-2 and Tables 4-10) can be identified by virtue ofbinding to the array. The substrate array can be on, i.e., a solidsubstrate, i.e., a “chip” as described in U.S. Pat. No. 5,744,305.Alternatively, the substrate array can be a solution array, i.e.,Luminex, Cyvera, Vitra and Quantum Dots' Mosaic.

The skilled artisan can routinely make antibodies, nucleic acid probes,i.e., oligonucleotides, aptamers, siRNAs, anti sense oligonucleotides,against any of the RA-associated genes listed in Tables 1-2 and Tables4-10.

Other Embodiments

While the invention has been described in conjunction with the detaileddescription thereof, the foregoing description is intended to illustrateand not limit the scope of the invention, which is defined by the scopeof the appended claims. Other aspects, advantages, and modifications arewithin the scope of the following claims.

EXAMPLES

In the Examples below, subjects having “active RA” were selected on thebasis of the following criteria: 6 or more swollen joints, 9 or moretender joints, CRP>2 mg/dL, and may require introduction of moreaggressive therapy. Subjects suffering from RA and described as being“stable” indicate that the subject was responsive to the therapeuticbeing administered. Subjects suffering from RA and described as being“unstable” that their disease was not responding to the therapeuticbeing administered, and whose therapeutic was scheduled to be changed.

Example 1 Development and Use of Population Normative Values for GeneExpression Profiles

FIG. 1A shows the arithmetic mean values for gene expression profiles(using 48 loci of the Inflammation Gene Expression Panel included inTables 1 and 2) obtained from whole blood of two distinct patientpopulations (patient sets). These patient sets are both normal orundiagnosed. The first patient set, which is identified as Bonfils (theplot points for which are represented by diamonds), was composed of 17subjects accepted as blood donors at the Bonfils Blood Center in Denver,Colo. The second patient set was composed of 16 donors, for which GeneExpression Profiles were obtained from assays conducted four times overa four-week period. Subjects in this second patient set (plot points forwhich are represented by squares) were recruited from employees ofSource Precision Medicine, Inc., the assignee herein. Gene expressionaverages for each population were calculated for each of 48 gene loci ofthe Inflammation Gene Expression Panel included in Tables 1 and 2. Theresults for loci 1-24 (sometimes referred to below as the Inflammation48A loci) are shown in FIG. 1A and for loci 25-48 (sometimes referred tobelow as the Inflammation 48B loci) are shown in FIG. 1B.

The consistency between gene expression levels of the two distinctpatient sets is dramatic. Both patient sets show gene expressions foreach of the 48 loci that are not significantly different from eachother. This observation suggests that there is a “normal” expressionpattern for human inflammatory genes, that a Gene Expression Profile,using the Inflammation Gene Expression Panel included in Tables 1 and 2(or a subset thereof) characterizes that expression pattern, and that apopulation-normal expression pattern can be used, for example, to guidemedical intervention for any biological condition that results in achange from the normal expression pattern.

In a similar vein, FIG. 2 shows arithmetic mean values for geneexpression profiles (again using the 48 loci of the Inflammation GeneExpression Panel included in Tables 1 and 2) also obtained from wholeblood of two distinct patient populations (patient sets). One patientset, expression values for which are represented by triangular datapoints, was 24 normal, undiagnosed subjects (who therefore had no knowninflammatory disease). The other patient set, the expression values forwhich are represented by diamond-shaped data points, was four patientswith rheumatoid arthritis and who have failed therapy (who therefore hadunstable rheumatoid arthritis).

As remarkable as the consistency of data from the two distinct normalpatient sets shown in FIGS. 1A and 1B was the systematic divergence ofdata from the normal and diseased patient sets shown in FIG. 2. In 45 ofthe shown 48 inflammatory gene loci, subjects with unstable rheumatoidarthritis showed, on average, increased inflammatory gene expression(lower cycle threshold values; Ct), than subjects without disease. Thedata thus further demonstrate that is possible to identify groups withspecific biological conditions using gene expression if the precisionand calibration of the underlying assay are carefully designed andcontrolled according to the teachings herein.

Example 2 Consistency of Expression Values of Constituents in GeneExpression Panels Over Time as Reliable Indicators of BiologicalCondition

FIG. 3 also shows the effect over time, on inflammatory gene expressionin a single human subject suffering from rheumatoid arthritis, of theadministration of a TNF-inhibiting compound, but here the expression isshown in comparison to the cognate locus average previously determinedfor the normal (i.e., undiagnosed, healthy) patient set. As part of alarger international study involving patients with rheumatoid arthritis,the subject was followed over a twelve-week period. The subject wasenrolled in the study because of a failure to respond to conservativedrug therapy for rheumatoid arthritis and a plan to change therapy andbegin immediate treatment with a TNF-inhibiting compound. Blood wasdrawn from the subject prior to initiation of new therapy (visit 1).After initiation of new TNF-inhibiting therapy, blood was drawn at 4weeks post change in therapy (visit 2), 8 weeks (visit 3), and 12 weeks(visit 4) following the start of new therapy. Blood was collected usingthe PAXgene Blood RNA System™, held at room temperature for two hoursand then frozen at −30° C.

Frozen samples were shipped to the central laboratory at SourcePrecision Medicine, the assignee herein, in Boulder, Colo. fordetermination of expression levels of genes in the 48-gene inflammationgene expression panel included in Tables 1 and 2. The blood samples werethawed and RNA extracted according to the manufacturer's recommendedprocedure. RNA was converted to cDNA and the level of expression of the48 inflammatory genes was determined. Expression results are shown for11 of the 48 loci in FIG. 3. When the expression results for the 11 lociare compared from visit one to a population average of normal blooddonors from the United States, the subject shows considerabledifference. Similarly, gene expression levels at each of the subsequentphysician visits for each locus are compared to the same normal averagevalue. Data from visits 2, 3 and 4 document the effect of the change intherapy. In each visit following the change in the therapy, the level ofinflammatory gene expression for 10 of the 11 loci is closer to thecognate locus average previously determined for the normal (i.e.,undiagnosed, healthy) patient set.

FIG. 4A further illustrates the consistency of inflammatory geneexpression, illustrated here with respect to 7 loci of (of theInflammation Gene Expression Panel included in Tables 1 and 2), in a setof 44 normal, undiagnosed blood donors. For each individual locus isshown the range of values lying within ±2 standard deviations of themean expression value, which corresponds to 95% of a normallydistributed population. Notwithstanding the great width of theconfidence interval (95%), the measured gene expression value(ΔCt)—remarkably—still lies within 10% of the mean, regardless of theexpression level involved. As described in further detail below, for agiven biological condition an index can be constructed to provide ameasurement of the condition. This is possible as a result of theconjunction of two circumstances: (i) there is a remarkable consistencyof Gene Expression Profiles with respect to a biological conditionacross a population and (ii) there can be employed procedures thatprovide substantially reproducible measurement of constituents in a GeneExpression Panel giving rise to a Gene Expression Profile, undermeasurement conditions wherein specificity and efficiencies ofamplification for all constituents of the panel are substantiallysimilar and which therefore provides a measurement of a biologicalcondition. Accordingly, a function of the expression values ofrepresentative constituent loci of FIG. 4A is here used to generate aninflammation index value, which is normalized so that a reading of 1corresponds to constituent expression values of healthy subjects, asshown in the right-hand portion of FIG. 4A.

In FIG. 4B, an inflammation index value was determined for each memberof a set of 42 normal undiagnosed blood donors, and the resultingdistribution of index values, shown in the figure, can be seen toapproximate closely a normal distribution, notwithstanding therelatively small subject set size. The values of the index are shownrelative to a 0-based median, with deviations from the median calibratedin standard deviation units. Thus 90% of the subject set lies within +1and −1 of a 0 value. We have constructed various indices, which exhibitsimilar behavior.

FIG. 4C illustrates the use of the same index as FIG. 4B, where theinflammation median for a normal population of subjects has been set tozero and both normal and diseased subjects are plotted in standarddeviation units relative to that median. An inflammation index value wasdetermined for each member of a normal, undiagnosed population of 70individuals (black bars). The resulting distribution of index values,shown in FIG. 4C, can be seen to approximate closely a normaldistribution. Similarly, index values were calculated for individualsfrom two diseased population groups, (1) rheumatoid arthritis patientstreated with methotrexate (MTX) who are about to change therapy to moreefficacious drugs (e.g., TNF inhibitors)(hatched bars), and (2)rheumatoid arthritis patients treated with disease modifyinganti-rheumatoid drugs (DMARDS) other than MTX, who are about to changetherapy to more efficacious drugs (e.g., MTX). Both populations ofsubjects present index values that are skewed upward (demonstratingincreased inflammation) in comparison to the normal distribution. Thisfigure thus illustrates the utility of an index to derived from GeneExpression Profile data to evaluate disease status and to provide anobjective and quantifiable treatment objective. When these twopopulations of subjects were treated appropriately, index values fromboth populations returned to a more normal distribution (data not shownhere).

FIG. 5 plots, in a fashion similar to that of FIG. 4A, Gene ExpressionProfiles, for the same 7 loci as in FIG. 4A, two different 6-subjectpopulations of rheumatoid arthritis patients. One population (called“stable” in the figure) is of patients who have responded well totreatment and the other population (called “unstable” in the figure) isof patients who have not responded well to treatment and whose therapyis scheduled for change. It can be seen that the expression values forthe stable patient population, lie within the range of the 95%confidence interval, whereas the expression values for the unstablepatient population for 5 of the 7 loci are outside and above this range.The right-hand portion of the figure shows an average inflammation indexof 9.3 for the unstable population and an average inflammation index of1.8 for the stable population, compared to 1 for a normal undiagnosedpopulation of patients. The index thus provides a measure of the extentof the underlying inflammatory condition, in this case, rheumatoidarthritis. Hence the index, besides providing a measure of biologicalcondition, can be used to measure the effectiveness of therapy as wellas to provide a target for therapeutic intervention.

An algorithm specific for rheumatoid arthritis (shown in FIG. 6) usefulfor distinguishing between subjects afflicted with rheumatoid arthritisversus normal subjects or other diseased subjects was applied. As can beseen in FIG. 6, the index easily distinguishes RA subjects from bothnormal subjects and subjects suffering from a disease other than RA(e.g., bacteremia subjects).

Example 3 Experimental Design for Evaluating Rheumatoid Arthritis:Assessing Response to Therapeutic Treatment

In a series of studies, the inflammation index was used to assessresponse to therapeutic treatment by individuals suffering fromrheumatoid arthritis, or inflammatory conditions induced or related torheumatoid arthritis.

In one study, the inflammation index was used to assess a single subjectsuffering from rheumatoid arthritis, who has not responded well totraditional therapy with methotrexate. The results are depicted in FIG.7, where the inflammation index for this subject is shown on the farright of the graph at the start of a new therapy (a TNF inhibitor), andthen, moving leftward, successively, 2 weeks, 6 weeks, and 12 weeksthereafter. As can be seen from FIG. 7, the index can be seen movingtowards normal, consistent with physician observation of the patient asresponding to the new treatment.

In a similar study, the inflammation index was used to assess threesubjects suffering from rheumatoid arthritis, who have not respondedwell to traditional therapy with methotrexate, at the beginning of newtreatment (also with a TNF inhibitor), and 2 weeks and 6 weeksthereafter. The results are depicted in FIG. 8, where the index in eachcase can again be seen moving generally towards normal, consistent withphysician observation of the patients as responding to the newtreatment.

In another study, the inflammation index was used to assess three groupsof ten international subjects suffering from RA, each of whom has beencharacterized as stable (that is, not anticipated to be subjected to achange in therapy) by the subject 's treating physician, and each grouphaving been treated with either methotrexate, Enbrel (a TNF inhibitor),or Remicade (another TNF inhibitor). The results are depicted in FIGS.9-11. FIG. 9 shows the index for each of 10 patients in the group beingtreated with methotrexate, which known to alleviate symptoms withoutaddressing the underlying disease. FIG. 10 shows the index for each of10 patients in the group being treated with Enbrel, and FIG. 11 showsthe index for each 10 patients being treated with Remicade. It can beseen that the inflammation index for each of the patients in FIG. 9 iselevated compared to normal, whereas in FIG. 10, the patients beingtreated with Enbrel as a class have an inflammation index that comesmuch closer to normal (80% in the normal range). In FIG. 11, it can beseen that, while all but one of the patients being treated with Remicadehave an inflammation index at or below normal, two of the patients havean abnormally low inflammation index, suggesting an immunosuppressiveresponse to this drug. (Indeed, studies have shown that Remicade hasbeen associated with serious infections in some subjects, and here theimmunosuppressive effect is quantified.) Also in FIG. 11, one subjecthas an inflammation index that is significantly above the normal range.This subject in fact was also on a regimen of an anti-inflammationsteroid (prednisone) that was being tapered; within approximately oneweek after the inflammation index was sampled, the subject experienced asignificant flare of clinical symptoms.

Remarkably, these examples show a measurement, derived from the assay ofblood taken from a subject, pertinent to the subject's arthriticcondition. Given that the measurement pertains to the extent ofinflammation, it can be expected that other inflammation-basedconditions, including, for example, cardiovascular disease, may bemonitored in a similar fashion.

In one study, a statistical T-test was applied to identify potentialmembers of a signature gene expression panel that were capable ofdistinguishing between normal subjects (n=69) and subjects sufferingfrom unstable rheumatoid arthritis (n=23). The results are depicted inFIG. 12, where the grayed boxes show genes that are individually highlyeffective (T-test P values noted in the box to the right in each case)in distinguishing between the two sets of subjects, and thus indicativeof potential members of a signature gene expression panel for rheumatoidarthritis.

FIG. 13 shows a comparison of the change in relative gene expression of24 patients unstable on Disease Modifying Anti-Rheumatoid Drug (DMARD)therapy, 19 patients stable on DMARD therapy, and 20 patients stable onTNF blockers relative to the mean (ACt) of normal gene expression for 11genes of interest for RA patients. As can be seen, the relative changein expression, (indicated by X-fold change relative to the mean fornormal gene expression) ranges from 3.45-fold greater expression thanthe mean for MMP9, to 2.23-fold greater expression than the mean forIL10, for patients unstable on DMARD. Perhaps more surprisingly,patients who were stable on DMARD, suggesting that the therapy wassuccessfully treating the RA conditions, showed anywhere from 2.75-foldgreater expression than the mean for MMP9 to 1.29-fold greater than themean for IL10. While not intending to be bound by any theory, theseresults perhaps suggest that that the DMARD therapy was in fact nottreating the underlying condition, but merely treating the symptoms ofRA. In fact, for 10 of the 11 genes investigated with this study, allpatients, whether unstable on DMARD therapy or stable on DMARD therapyexhibited statistically significant increased gene expression relativeto the mean for normal gene expression (see the change in expression forgenes MMP9, CD14, TIMP1, HSPA1A, GFB1, IL10, IL1RN, CXCL1, IL1B, andPTGS2). Only the expression of the gene CD 19 was not increased relativeto the mean for normal gene expression, and in fact, the values of 0.64and 0.68 for patient unstable or stable on DMARD, respectively, arestatistically significantly lower than the normal expression for thisgene. The t-test p values are shown on the right half of this table,indicating which changes in gene expression shown on the left half ofthe table are statistically significant.

In contrast, for patients being treated with TNF blockers, one can seethat the gene expression levels were all within the statistical meancompared to normal gene expression, and the expression levels for IL10,IL1RN, CXCL2, PTGS2, and CD19 were in fact statistically significantlylower than the mean for normal expression levels. While not intending tobe bound by any theory, these results support the position that evenpatients responding to DMARD therapy should be treated more aggressivelyand placed on TNF blockers earlier. This is strong clinical evidencethat traditional DMARD therapy may not, in fact, be treating theunderlying cause or condition of rheumatoid arthritis, and that patientsso treated are at risk for progression of the disease, if moreaggressive therapy, such as treatment with TNF blockers, is notinitiated.

FIG. 14 shows gene expression values for 8 genes in 17 patients withActive RA (i.e., each patient had 6 or more swollen joints, 9 or moretender joints, and had CRP at greater than 2 mg/dL), expressed asstandard deviation changes from normal. The light-shaded boxes indicateall gene expression values for the various patients that are greaterthan 2-fold standard deviation units above or below the normal values,and are statistically significant. As can be seen, nearly all of thesepatients had changes in standard deviation of greater than two forseveral of the genes investigated. Moreover, using a standard algorithmreferred to herein as the IR-105 algorithm (note, statistically derivedalgorithms are discussed in further detail above) 15 of the 17 patientswere assessed with IR-105 values greater than 2, suggesting that thesepatients could benefit from more aggressive therapy.

FIG. 15 is a study of 22 patients with Active RA (same as for FIG. 14)performed by different rheumatologists on different patients from thosein FIG. 14, where similar results are observed. 20 of the 22 patientsexhibit changes in standard deviation units that are more than 2-foldgreater than the values for normal gene expression. In particular, thevalues for the genes TGFB1 and TIMP1 are particularly increased in mostof the patients. As with the results in FIG. 14, these data suggest thatthe patients are not responding to the current therapy, as these resultslook similar to those for patients in FIG. 14 who failed DMARD therapy.Again, these data suggest that more aggressive therapy, in addition toor in place of current treatment, may be in order for these patients.

FIG. 16 displays the gene expression values, expressed in change ofstandard deviation units relative to normal expression values, for 19patients who were stable on TNF blockers for 3 months prior to thestudy. As can be seen, the gene expression values for these patients arealmost all within “normal.” The exception for patient 01a is explainableand does not dilute the value of this study. Patient 01a was in fact notstable 3 months prior to the study but rather, flared 1 month prior tothe study, was treated aggressively with prednisone, and the data forthis patient comes from 1 week prior to the flare. These data show thepredictive value of gene expression analysis for such patients, sincethe values for TGFB1, TIMP1 and IL1RN for this patient were higher thanthey should have been had the patient in fact been stable, but are whatmight be expected for a patient in an active RA condition.

Similarly, patients 13 and 18 have gene expression values for PTGS2,ILB, and IL1RN that are underexpressed relative to the normal values,suggesting that these patients are trending towards immunosuppression.Again, these data show the predictive value of gene expression analysisfor identifying patients at risk for immunosuppression.

FIG. 17 shows the changes in gene expression values relative to normals(n=69 for this study) for 22 patients being treated with kineret orkineret plus sTNF-R1 at baseline, week 4, and termination of the study.The relative expression for 11 of the 12 genes examined wereoverexpressed at the start of the study, still over-expressed at week 4but less so, and by the end of the study, 7 genes were stillover-expressed to some extent, while 4 genes had transitioned to theunder-expressed category (IL1RN, CXCL1, and PTGS2—light boxes). Ofinterest, the values of CD19 were all under-expressed from the start ofthe study until the end. It is worth noting that by the end of thestudy, the expression of PTGS2 (prostaglandin endo-peroxide synthase 2,i.e. COX2), was not statistically significantly different from normal.These data indicate how the gene expression data can track effectivetherapy in RA patients.

In another study, a panel of 10 genes known as the Inflammation Indexwere used to evaluate the gene expression profiles for clinically stableRA patients treated with various therapeutics. The results are depictedin FIGS. 18-22. FIG. 18 shows gene expression values for clinicallystable RA patients being treated with methotrexate for a panel of 10genes known as the Inflammation Index (in standard deviation units).

FIG. 18 shows gene expression values for clinically stable RA patientsbeing treated with enbrel (etanercept) for a panel of 10 genes known asthe Inflammation Index (in standard deviation units).

FIG. 20 shows gene expression values for clinically stable RA patientsbeing treated with Remicade (infliximab) for a panel of 10 genes knownas the Inflammation Index (in standard deviation units).

FIG. 21 shows gene expression values for additional clinically stable RApatients being treated with Remicade (infliximab) for a panel of 10genes known as the Inflammation Index (in standard deviation units).

FIG. 22 shows an assessment of the data presented in FIGS. 20 and 21,wherein a TNF inhibitor study performed on the patients(remicade/methotrexate program) resulted in a flare-up for patient 01a(see FIG. 21), requiring aggressive treatment involving increased TNFinhibitor dosage. These results for this patient were discussedpreviously, in FIG. 16, above.

Example 4 Correlating Clinical Assessment with High Precision GeneExpression Profiling

FIG. 23 shows the response of 22 patients over time in respect to 5different clinical assessments traditionally used by clinicians for astudy treating RA patients with kineret or sTNF-R1 plus kineret: DiseaseActivity Score (DAS); Swollen Joint Count (SJC); Tender Joint Count(TJ); MD Assessment of Disease (MDAD); and Health AssessmentQuestionaire (HAQ) self score, based on patient's evaluation on qualityof life factors. As can be seen, the DAS decreased from the baseline of6.51 to 5.35 after 4 weeks and down to 4.88 at the end of the study, andthe joint counts, MD assessment and patient self-score also all wentdown over the course of the study and treatment period.

FIG. 24 shows Pearson Correlation Coefficients across all values for 5clinical assessment methods traditionally used by physicians to monitorRA patients. As can be seen, the values for swollen joint count (SJC)and tender joint count (TJC) are not always statistically significant,but those for DAS tend to have statistical significance for most genesexamined with this study. This study shows how it is possible todetermine associations with clinical endpoints using simplecorrelations.

FIG. 25 shows that DAS is becoming the gold standard for comparison ofthe usefulness of the methodologies described and claimed herein using agene expression panel selected to enable diagnosis, evaluation, andtreatment of RA in patients. As can be seen, a mixed model analysis(i.e., multiple genes) indicates that the p-values for genes associatedwith determination of DAS values are statistically significant,indicating that the gene expression values determined for these genes insuch a panel can be used in place of the traditional physician's DASdetermination. The advantages are clear. A physician needs at least anhour, sometimes more, to determine a DAS value for a patient, and manyof the indicators used to determine the DAS value are subjective. Incontrast, using gene expression values for the panel of genes shownhere, for example, allows an off-site, fast and totally objectiveassessment of a patient's RA status which, rivals, if not exceeds, theaccuracy of the in-clinic assessment traditionally done by a physician.

FIG. 26 shows the relationship of gene expression to physician'sassessment of disease using either a simple a correlation method ofanalysis (as in FIG. 23), or mixed model method of analysis for geneexpression values (as in FIG. 24) determined for a number of genes ofinterest in unstable RA patients undergoing treatment with methotrexate.

Example 5 Clinical Data Analyzed with Latent Class Modeling (1-Gene,2-Gene 3-Gene Models)

From a targeted 76-gene panel, selected to be informative relative tobiological state of RA patients, primers and probes were prepared for asubset of 47 genes (those with p-values of 0.05 or better) (Table 3).Gene expression profiles were obtained using this subset of genes, andof these individual genes, TRL2 was found to be uniquely and exquisitelyinformative regarding RA, yielding the best discrimination from normalsof the genes examined.

A ranking of the top 47 genes is shown in Table 3 summarizing theresults of significance tests for the difference in the mean expressionlevels for Normals and RAs. Since competing methods are available thatare justified under different assumptions, the p-values were computed in2 different ways:

-   1) Based on 1-way ANOVA. This approach assumes that the gene    expression is normally distributed with the same variance within    each of the 2 populations.-   2) Based on stepwise logistic regression (STEP analysis), where    group membership (Normal vs. RA) is predicted as a function of the    gene expression. Conceptually, this is the reverse of what is done    in the ANOVA approach where the gene expression is predicted as a    function of the group. The logistic distribution holds true under    several different distributional assumptions, including those that    are made in the 1-way ANOVA approach. Thus, this second strategy is    justified under a more general class of distributional assumptions    than the ANOVA approach.

As expected, the two different approaches yield comparable p-values andcomparable rankings for the genes. Only 8 genes were found not to besignificant at the 0.05 level. TLR2 was found to be the mostsignificant. Table 3 shows that compared to the Normals, the washed-outRAs tend to be under-expressed with respect to TLR2.

Gene expression profiles were obtained using these subsets of genesusing the Search procedure in GOLDMineR (Magidson, 1998) to implementstepwise logistic regressions (STEP analysis) for predicting thedichotomous variable that distinguishes RAs from Normals as a functionof all 47 genes in an RA longitudinal study, i.e., a study that followedRA patients over an extended period of time after initiatingInterleukin-1 receptor antagonist (IL1ra) or IL1ra plus soluble TNF-αreceptor-1 therapy (IL1ra+sTNFR1) (N=22 ‘washed-out’ RA subjects, andN=134 Normal subjects). The STEP analysis was performed under theassumption that the gene expressions follow a multinormal distribution,with different means and different variance-covariance matrices for thenormal and RA population.

Actual correct classification rate for the RA patients and the normalsubjects were computed. Multi-gene models were constructed which werecapable of correctly classifying RA and normal subjects with at least75% accuracy. These results are shown in Tables 4-10 below. Asdemonstrated in Tables 5-6 and Tables 8-10, as few as two genes allowsdiscrimination between individuals with RA and normals at an accuracy ofat least 75%.

One Gene Model

A STEP analysis was used first to find which were the most significantgenes to use as the first gene in two gene models. All 47 genes wereevaluated for significance (i.e., p-value) regarding their ability todiscriminate between RA and Normals, and ranked in the order ofsignificance (see, Table 4). The optimal cutoff on the ACt value foreach gene was chosen that maximized the overall correct classificationrate. The actual correct classification rate for the RA and Normalsubjects was computed based on this cutoff and determined as to whetherboth reached the 75% criteria. None of these 1-gene models satisfied the75%/75% criteria.

Two Gene Model

The top 6 genes (lowest p-value discriminating between RA and Normals,highlighted in Table 4) were subject to further analysis in a two-genemodel. Each of the top 6 genes, one at a time, was used as the firstgene in a 2-gene model, where all 46 remaining genes were evaluated asthe second gene in this 2-gene model. All models that yieldedsignificant incremental p-values, at the 0.05 level, for the second genewere then analyzed using Latent Gold to find R² values. The R² statisticis a less formal statistical measure of goodness of prediction, whichvaries between 0 (predicted probability of being in RA is constantregardless of delta-ct values on the 2 genes) to 1 (predictedprobability of being RA=1 for each RA subject, and =0 for each Normalsubject). If the 2-gene model yielded an R² value greater than 0.6 itwas kept as a model that discriminated well. If these models met the 0.6cutoff, their statistical output from Latent Gold, was then used todetermine classification percentages (shown in Table 5).

Three Gene Model

The 2-gene models that discriminated well were subject to more STEPanalyses as the first two genes for 3-gene models where all 45 remaininggenes were evaluated as the third gene in this 3-gene model. Again,Latent Gold was used to determine R² values as well as p-values for eachgene. All models that yielded significant incremental p-values, at the0.05 level, for the third gene were then analyzed using Latent Gold tofind R² values. For all 3-gene models that yielded an R² value greaterthan 0.6, classification percentages were determined using theirstatistical output from Latent Gold (shown in Table 6)

Without taking into account group membership (normals vs. RA) in theestimation of the model parameters, the 3-gene model perfectlydistinguished the 2 groups. The most significant of these genes wasTLR2. Given TLR2, a second gene, CD4, made the most significantincremental contribution towards the discrimination between the normalsand RAs, and including a 3^(rd) gene, NFKB1, in the model led to perfectdiscrimination. Other 3-gene models are described in Table 6. Overall,the expression levels of at least 38 genes were found to differsignificantly (p<0.05) between 134 normals and 22 ‘washed-out’ RAs amongthe 47 genes for which measurements were obtained.

FIG. 27 shows the strong discrimination provided by TLR2 and CD4. Fornormals, the mean expression levels are (TLR2, CD4)=(16.1, 14.8); forRAs, (TLR2, CD4)=(14.8, 15.1). Within Normals, TLR2 and CD4 have asignificant positive correlation (r=0.577), but within RAs, thecorrelation is not significantly different from 0 (r=0.061).

Given these 2 genes, adding a 3^(rd) (NFKB1) provided perfectprediction. Like TLR2, RAs are under-expressed on NFKB 1, and therelationship between NFKB1 and CD4 (see FIG. 28) is similar to therelationship between TLR2 and CD4 shown in FIG. 27. (However, unlike therelationship shown in FIG. 27 for TLR2 and CD4, the correlation betweenNFKB1 and CD4 is large for both Normals (r=0.799) and RAs (r=0.738).)

Again, with these 3 genes included in the model, prediction was perfect.In fact, the predicted probability of being an RA was 1 for each RA inthe sample and 0 for each Normal case (to 8 decimal places). (Thepredicted probability incorporates the prior probability of being an RA,which is reflected in the intercept of the logit model. Since thisanalysis consists of 22 RAs out of a total sample of 154, the priorprobability was set at 22/154. Alternative priors can be used whichchange the intercept—a change in the prior does not affect the otherestimated coefficients. For example, to use a prior probability of 0.5of being an RA, the current prior odds of 22/132 would be multiplied by132/22, which is equivalent to adding log(132/22)=1.179 to theintercept. To use a prior odds of 1:1000, the current prior odds of22/132 would need to be multiplied by (132/22)×(1/1000)=0.006, and thuslog(0.006)=−5.116 would be added (5.116 subtracted) from the intercept.In both of these cases, prediction still would be perfect—that is, thepredicted Prob(RA) is close to 1 for RAs and close to 0 for Normals. Thepredicted probabilities are not affected much by changes in the priorprobability. For example, if the prior RA: Normal odds is taken anywherein the range between 1:100,000 and 100,000:1, the predicted probabilityof being an RA would still be 1.00 for each true RA and 0.00 for eachtrue normal (to at least 2 decimal places).)

The following model was used:Logit(RA)=1020.9+165.3*CD4−115.8*NFKB1−101.3*TLR2

where ‘Logit’ is the logarithm of the odds of being an RA as opposed toa Normal. Thus,Prob(RA)=exp[Logit(RA)]/[1+exp[Logit(RA)],

where Prob(RA) is the probability of being an RA under the assumptionthat there are only 2 populations—Normals and RAs.

Higher values for CD4 and lower values for NFKB1 and TLR2 increase theodds (probability) of RA. While these are maximum likelihood estimates,because of the perfect aspect of the model, no standard errors areavailable for the model parameters. To see whether the model could besimplified changing the perfect prediction of the model, coefficientsfor NFKB1 and TLR2 were equated, and the model re-estimated. (Equatingthe coefficients for TLR2 and NFKB1 was accomplished by creating a newvariable (called tPn) equal to the sum of TLR2 and NFKB1, and using thistogether with CD4 in the logit model.) The prediction continued to beperfect, as shown by the separation line added to FIG. 29. The moreparsimonious the model (fewer parameters), the more likely the model isto validate on other data. Thus, the fewer the genes used, the fewer thenumber of model parameters. Here, there are only 3 genes, and the numberof parameters were further reduced by equating the effects of 2 of thegenes.

FIG. 29 plots tPn by CD4 and shows that a line can perfectly distinguishthe two groups. All cases above the line are Normal cases, all below areRAs. This separation line is not unique. (This line is not unique asthere are an infinite number of lines that can provide perfectdiscrimination. Note that because the discrimination is perfect, withoutadditional data containing some amount of imperfect discrimination it isnot possible to determine whether the sum of TLR2 and NFKB1 is optimalor some other weighting such as tPn2=1.15*NFKB1+TLR2. That is, there arean infinite number of possible weightings of these 2 genes that provideperfect discrimination.) However, it is possible to obtain a uniqueequation, and estimated probabilities of being an RA (strictly between 0and 1) by making some distributional assumptions. We take this approachin the section below on estimating a latent class model. Alternatively,a conditional model can be estimated, for example, by focusing on theprediction of tPn as a function of CD4. FIGS. 30 and 31 show the bestleast squares fitting line and lower 95% predicted confidence limit forthe Normals data showing the expected value for tPn and tPn2 given aparticular expression level for CD4. The RA population falls below thelower limit.

Example 6 Clinical Data Analyzed with Latent Class Modeling (1-Gene,2-Gene, 3-Gene and 4-Gene Models)

From a targeted 76-gene panel, selected to be informative relative tobiological state of RA patients, primers and probes were prepared for asubset of 24 genes (those with p-values of 0.05 or better). Geneexpression profiles were obtained using this subset of genes, and ofthese individual genes, ICAM was found to be uniquely and exquisitelyinformative regarding RA, yielding the best discrimination from normalsof the genes examined.

The Search procedure in GOLDMineR (Magidson, 1998) was used to implementa stepwise logistic regressions (STEP analysis) were used for predictingthe dichotomous variable that distinguishes RAs from Normals as afunction of all 47 genes in a study an RA longitudinal study, i.e., astudy that followed RA patients over an extended period of time afterinitiating NSAIDS, methotrexate, or new TNF-inhibitor therapy (N=20‘washed-out’ RA subjects and N=32 Normal). A STEP analysis was used toobtain predictions under 1-gene, 2-gene, 3-gene, and 4-gene models. Asdescribed infra, 4-gene models provided perfect discrimination betweenRA and Normal populations.

One Gene Model

A STEP analysis was used first to find which were the most significantgenes to use as the first gene in two gene models. All 24 genes wereevaluated for significance (i.e., p-value) regarding their ability todiscriminate between RA and Normals, and ranked in the order ofsignificance (see, Table 7). The optimal cutoff on the ACt value foreach gene was chosen that maximized the overall correct classificationrate. The actual correct classification rate for the RA and Normalsubjects was computed based on this cutoff and determined as to whetherboth reached the 75% criteria. None of these 1-gene models satisfied the75%/75% criteria.

Two Gene Model

The top 6 genes (lowest p-value discriminating between RA and Normals,(highlighted in Table 7) were subject to further analysis in a two-genemodel. Each of the top 6 genes, one at a time, was used as the firstgene in a 2-gene model, where all 23 remaining genes were evaluated asthe second gene in this 2-gene model. All models that yieldedsignificant incremental p-values, at the 0.05 level, for the second genewere then analyzed using Latent Gold to find R² values for each gene.The R² statistic is a less formal statistical measure of goodness ofprediction, which varies between 0 (predicted probability of being in RAis constant regardless of delta-ct values on the 2 genes) to 1(predicted probability of being RA=1 for each RA subject, and =0 foreach Normal subject). If the 2-gene model yielded an R² value greaterthan 0.6 it was kept as a model that discriminated well. If these modelsmet the 0.6 cutoff, Latent Gold or GoldMineR was then used to determineclassification percentages (shown in Table 8).

Three Gene Model

The 2-gene models that discriminated well were subject to more STEPanalyses as the first two genes for 3-gene models where all 22 remaininggenes were evaluated as the third gene in the 3-gene model. Again,Latent Gold was used to determine R² values as well as p-values for eachgene. All models that yielded significant incremental p-values, at the0.05 level, for the third gene were then analyzed using Latent Gold tofind R² values. For all 3-gene models that yielded an R² value greaterthan 0.6, classification percentages were determined using Latent Goldor GoldMineR (shown in Table 9).

Four Gene Model

The 3-gene models that discriminated well were subject to yet more STEPanalysis as the first 3 genes for 4-gene models where all 21 remaininggenes were evaluated as the 4^(th) gene in the 4-gene model. Again,Latent Gold was used to determine R² values as well as p-values for eachgene. All models that yielded significant incremental p-values, at the0.05 level for the fourth gene were then analyzed using Latent Gold tofind R² values. For all 3-gene models that yielded an R² value greaterthan 0.6, classification percentages were determined using Latent Goldor GoldMineR (shown in Table 10)

Again, without taking into account group membership (normals vs. RA) inthe estimation of the model parameters, 4-gene modeling perfectlydistinguished the 2 groups. In one model, the most significant of thesegenes was ICAM. Given ICAM, a second gene, HLADRA, made the mostsignificant incremental contribution towards the discrimination betweenthe normals and RAs. Including a 3^(rd) gene, HSPA1A, in the model madethe next most significant incremental contribution towardsdiscrimination, and including a 4^(th) gene, TGFB1 provided perfectdiscrimination. In the other model, the following 4-gene model providedperfect discrimination between normals and RA populations: CSPG2,HLADRA, CD14, and ITGAL. Other 4-gene models are described in Table 10.

Example 7 Clinical Data Analyzed with Latent Class Modeling: Normal V.Unstable RA

To test whether the results described above could be generalized beyondthe ‘washed-out’ RAs, Normals and RAs from an ‘unstable RA’ study wereexamined. NFKB1 was not measured for these subjects. Data from adifferent study containing 27 normals and 10 unstable RAs for whichmeasurements on both TLR2 and CD4 were available (NFKB1 expressions werenot measured in this study) were used to validate the results from the2-gene model. However, the 27 Normals and 10 ‘unstable RAs’ containednon-missing values on CD4 and TLR2. Compared to the 134 Normals from the‘washed-out’ RA study, these Normals were similar with respect to theestimated variances and correlation (r=0.59) for these gene expressions,but have significantly higher mean expression levels for both CD4 andTLR2: means=(15.6, 16.5) vs. (14.8, 16.1). See FIG. 32.

‘Unstable’ RAs with non-missing values on CD4 and TLR2 (N=10 depicted asstars in FIG. 33) are indistinguishable from ‘Washed-out’ RAs (N=22depicted as Xs in FIG. 33) with respect to mean expression levels andvariances for CD4 and TLR2. In addition, both show zero correlationbetween CD4 and TLR2, unlike the significant positive correlation forNormals. See FIG. 33.

FIG. 34 shows that Unstable RAs have significant lower levels thanNormals on both genes. FIG. 35 shows the data for both studies together.The results here suggest that the model can probably be generalized toapply also to unstable RAs.

Example 8 Estimating a Latent Class Model

Assuming that the gene expressions follow a multinormal (MVN)distribution within the Normal population, and that MVN alsocharacterizes the distribution within the RA population (with differentmeans and possibly different variances and correlations), it is possibleto develop a model that discriminates between the 2 groups without usinginformation about group membership.

Latent class (LC)/finite mixture modeling was used to examine the numberof latent classes in the data under the above assumptions, and determinehow closely related these classes are to the Normals and RAs. Thepossibility of selection effects was minimized in the development of themodel by not using the information about group membership in the modelestimation, and see to the extent to which the model could predict groupmembership. Baseline gene expressions for RAs and for normals on tPn andCD4 were determined, without identifying group membership to develop amodel that estimates the probability of being in the RA group (vs.normal) under the above distributional assumptions. This methodology issimilar to that employed by Vermunt and Magidson (2004) where a LC modelwas developed that accurately classified individuals into theappropriate group (normal, overt diabetes, chemical diabetes) on thebasis of 3 clinical measures (glucose, insulin, SSPG).

Given values for tPn and CD4, the Bayesian Information Criteria (BIC)correctly identified 2 classes in the data (the 2-class model had thesmallest BIC among models estimated containing between 1 and 4 classes).The first class consisted of all the Normals and the second consisted ofall the RAs. The estimated parameters from this model (means, variancesand correlations within each group) can be used to construct a 95%Confidence Region for the normals (FIG. 36). More specifically, it wasassumed that within the Normals population, the variables tPn2 and CD4follows a bivariate normal distribution with means (tPn, CD4)=(33.5,14.8), variances (1.33, 0.245) and correlation r=0.74. As can be seen inFIGS. 37 and 38, all of the RAs fall outside this region. As expected,some of the Normals also fall outside the 95% confidence region(approximately 5% would be expected to fall outside this region bychance). However, none of these falls below the discrimination line.

Next, MS subjects (n=11) were included among the sample subjects the LCanalysis was repeated. The results again indicated that there were 2classes (see Table 11) (BIC is lowest for the 2-class model). Class Iagain corresponded to the Normals, as shown in Table 12, all of thenormals were classified into this class. The parameter estimates werevery similar to those obtained without the MS subjects included, i.e.,the model described supra. As before, all of the RAs were classifiedinto class 2. The posterior probabilities are given in Tables 12, 13,and 14. Regarding the 11 MS subjects, 8 are assigned to the Normal class(for each of these, the posterior probability is estimated to be at 0.89or higher of being in this class), the other 3 are classified into class2 with the RA subjects. FIG. 37 shows the separation between 3 MSsubjects who look like RA subjects and the other 8 MS subjects who lookmore like Normals.

Assuming that these 3 MS subjects do not in fact have RA, this datasuggests that one or more additional genes could be included in themodel, to distinguish the RAs from those MS subjects who might havesimilar gene expressions on CD4 and tPn.

In the present RA model, the following inferences appear justified (atleast for a large subset of the genes):

-   1) The gene expressions within the normal population follows a    multivariate normal (MVN) distribution.-   2) The gene expressions within one or more non-normal populations    follows a multivariate normal distribution with perhaps different    means, variances and correlations.

Thus, a 95% ‘concentration ellipsoid’ can be constructed from the sampleof N₁=134 normals based on G genes which be expected to contain 95% ofthe normals. Moreover, if these genes are selected in a way that the MVN(multivariate normal) parameters (means, variances, correlations) differin significant ways from the non-normal population being studied, andthe sample sizes N₁ and N₂ (for the non-normals) were sufficientlylarge, it would be expected that the gene expressions for eachnon-normal case might fall outside the 95% normal confidence bounds.

Example 9 Relationship Between Gene Expressions and Clinical Outcomesfor Washed-Out RA Subjects

In a related study, various clinical outcomes including (AmericanCollege of Rheumatology score (ACR), C-Reactive Protein levels (W-CRP),Disease Activity Score (W-DAS), Erythrocyte Sedmintation rate (W-ESR),Health Assessment Questionnaire (W-HAQ), Physician's assessment ofdisease (PhysAssessDisease), Subject's assessment of disease(SubAssessDisease), Subject's assessment of pain (SubAssessPain),swollen joint count (SwollJoints), and tender joint count (TenderJoints)were considered, with a view to identifying the relationships betweenthe gene expressions at baseline and the clinical outcomes.

Since these outcome variables are not normally distributed, an ordinallogistic regression model was used (see e.g., Magidson, 1996) ratherthan a linear regression to predict them as a function of the geneexpressions. Again, the stepwise procedure within the GOLDMineR programwas used to perform the analysis. Because of the small sample size(N=22), the number of predictors that were allowed to enter into themodel was limited to two to reduce the likelihood of obtaining spuriousresults.

During baseline (time=0), significant relationships were found between 8of the 10 clinical outcomes. However, the genes found to be significantwere not necessarily those that were found to discriminate betweenNormals and RAs. Regarding subject's assessment of disease, TLR2 is thegene singled out as most highly related to this outcome (p=0.01). Asnoted earlier, TLR2 is the most significant discriminator between RAsand normals. Other outcomes show significant relationships with othergenes. For example, if the goal were to predict W-CRP or theSwollenJoints score, the best of the 48 genes would be IL8. However, IL8is not significantly different between RAs and Normals (IL8 expressionhas a mean value of 21.0 for both groups). Regarding the prediction ofSwollen Joints score, TLR2 enters into the linear regression second,following IL8. Swollen joint score increases with higher values on IL8and lower values on TLR2. See Table 15 for a summary of results from astepwise ordinal logit model. Thus while the relatively small sample ofRAs does not allow for definitive conclusions, it is clear that thereare strong relationships between the clinical outcomes and the geneexpressions. Only 2 measures (Sharp score and HAQ) were not found to besignificantly related to any of the 48 genes measured.

Example 10 Simulated Effect of Error in Data

A study simulating the effect of error in gene expression data wasconducted. For simplicity the study was limited to just 1 realization(using only a single set of random numbers as opposed to hundreds orthousands of replications), and added the random error quantities tojust the 3 genes used in the RA vs. Normals model discussed above. Thisrandom error generation was applied 4 times, generating what might beconsider a ‘small’ amount of error (s=0.2 below), moderate (s=0.5),large (s=1) and very large (s=2), where s is measured in standarddeviation units.

To relate the magnitude of “s” to the ability of the less precisemeasure to reproduce the more precise measure, for each of the 3 genes,Table 16 relates each value of “s” to the average % CD standard used inanalysis, and Table 17 below relates “s” to the percentage of thevariance of the more precise measure that is reproduced by the lessprecise measure. For these purposes, the “% CD standard” is the standarddeviation s of the gene expression data divided by its mean. Table 16calculates these average ‘% CD’ values based on the mean for each of the3 genes.

Let Y(j) represent the measure developed according to the practicesdescribed herein for the expression of gene j. For subject i, among asample of N=156 subjects (includes both the N₁=134 Normals and the N₂=22RAs), for each of 3 genes observed y_(i)(j), j=1, 2, 3, so vectorY(j)=[y₁(j), y₂(j), . . . , y_(N)(j)] was observed, for each gene j=1,2, 3. Supposing that there is a less precise measurement of gene j,Y′(j), which is simulated from the measured data by adding a randomerror “e” that is independently and identically distributed according tothe normal distribution with mean 0 and standard deviation “s”, is asfollows:y _(i)′(j)=y _(i)(j)+e _(i)(j), I=1,2, . . . , N; j=1,2,3The larger the value of“s”, the larger the contribution made by theerror, i.e., the more error is in the data. Taking the mean equivalentto 0 makes the less precise measure unbiased, that is, the differencebetween the two measurements is only that one is more precise than theother. Four sets of data were generated, corresponding to the valuess=0.2, 0.5, 1.0 and 2.0 which yield successively larger amounts oferror.

Y′(j)=Y(j)+error, where ‘error’ is independently and identicallydistributed as Normal with mean 0 and standard deviation s. The higherthe value of “s”, the more error in the measurement. Since the meanerror is zero, it was assumed that the less precise alternativemeasurement is unbiased. (Note: The square of the correlation betweenY(j) and Y′(j) is a measure of the percentage of the variance of Y(j)that is captured by Y′(j). The formula for this is VAR(Y)/[VAR(Y)+s²].These quantities are displayed in Table 17 below, followed by theobserved squared correlations based on the generated data in Table 18.)

The results of this simulation show that accurate measurement iscritical to discrimination. The standard deviation of the measurementsincreases as larger amounts of error are added (see Table 19). Forexample, for TLR2, the standard deviation increases from 0.79 to 0.83,to 0.90, to 1.25 to 1.93. The R-square statistic, which is a measure ofthe extent to which the 3-genes discriminate between the RAs and Normalsgoes from 1 (perfect discrimination) to 0.87, to 0.55, to 0.33 to 0.23as larger amounts of error are added to the data. Only for the situationwhere there is a ‘small’ amount of error, are 2-classes distinguished bya latent class analysis. That is, in the other situations the data doesnot contain sufficient differentiation to identify that it comes from 2populations. Using the ‘small error’ data, a latent class analysis wouldcorrectly identify 2 classes, but the 2-class model would misclassify 2normals and 14 RAs. If the linear algorithm from the original model wereused with these data, 2 RAs and 2Normals would be misclassified as shownin FIG. 76. If the original linear algorithm were applied to thesimulated data, increasing numbers of RAs and Normals would bemisclassified, as the amount of error is increased. Specifically, 4misclassifications occur with the small amounts of error (2 RAs and 2Normals), increasing to 12, 42 and 47 respectively as more error isadded.

FIGS. 38-41 show the plots corresponding to FIG. 29 as applied to thesimulated data.

These data support that Gene Expression Profiles with sufficientprecision and calibration as described herein (1) can determine subsetsof individuals with a known biological condition, particularlyindividuals with rheumatoid arthritis or individuals with inflammatoryconditions related to rheumatoid arthritis; (2) may be used to monitorthe response of patients to therapy; (3) may be used to assess theefficacy and safety of therapy; and (4) may be used to guide the medicalmanagement of a patient by adjusting therapy to bring one or morerelevant Gene Expression Profiles closer to a target set of values,which may be normative values or other desired or achievable values.

Gene Expression Profiles are used for characterization and monitoring oftreatment efficacy of individuals with rheumatoid arthritis, orindividuals with inflammatory conditions related to rheumatoidarthritis. Use of the algorithmic and statistical approaches discussedabove to achieve such identification and to discriminate in such fashionis within the scope of various embodiments herein.

The references listed below are hereby incorporated herein by reference.

REFERENCES

-   Magidson, J. GOLDMineR User's Guide (1998). Belmont, Mass.:    Statistical Innovations Inc.-   Vermunt J. K. and J. Magidson. Latent GOLD 4.0 User's Guide. (2005)    Belmont, Mass.: Statistical Innovations Inc.-   Vermunt J. K. and J. Magidson. Technical Guide for Latent GOLD 4.0:    Basic and Advanced (2005) Belmont, Mass.: Statistical Innovations    Inc.-   Vermunt J. K. and J. Magidson. Latent Class Cluster Analysis    in (2002) J. A. Hagenaars and A. L. McCutcheon (eds.), Applied    Latent Class Analysis, 89-106. Cambridge: Cambridge University    Press.-   Magidson, J. “Maximum Likelihood Assessment of Clinical Trials Based    on an Ordered Categorical Response.” (1996) Drug Information    Journal, Maple Glen, Pa.: Drug Information Association, Vol. 30, No.    1, pp 143-170.

TABLE 1 Rheumatoid Arthritis or Inflammatory Conditions Related toRheumatoid Arthritis Gene Expression Panel Symbol Name ClassificationDescription 1 APAF1 Apoptotic Protease Protease activating Cytochrome cbinds to Activating Factor 1 peptide APAF1, triggering activation ofCASP3, leading to apoptosis. May also facilitate procaspase 9 autoactivation. 2 BCL2 B-cell CLL/ Apoptosis Inhibitor - Blocks apoptosis bylymphoma 2 cell cycle control - interfering with the oncogenesisactivation of caspases 3 BPI Bactericidal/permeability- Membrane-boundLPS binding protein; increasing protease cytotoxic for many gram proteinnegative organisms; found in myeloid cells 4 C1QA Complement Proteinase/Serum complement system; component 1, q proteinase inhibitor forms C1complex with the subcomponent, alpha proenzymes c1r and c1s polypeptide5 CASP1 Caspase 1 Proteinase Activates IL1B; stimulates apoptosis 6CASP3 Caspase 3 Proteinase/ Involved in activation Proteinase Inhibitorcascade of caspases responsible for apoptosis - cleaves CASP6, CASP7,CASP9 7 CASP9 Caspase 9 Proteinase Binds with APAF1 to become activated;cleaves and activates CASP3 8 CCL1 Chemokine (C-C Cytokines/ Secreted byactivated T Motif) ligand 1 Chemokines/ cells; chemotactic for GrowthFactors monocytes, but not neutrophils; binds to CCR8 9 CCL2 Chemokine(C-C Cytokines- CCR2 chemokine; Recruits Motif) ligand 2 chemokines-monocytes to areas of injury growth factors and infection; Upregulatedin liver inflammation; Stimulates IL-4 production; Implicated indiseases involving monocyte, basophil infiltration of tissue (e.g,.psoriasis, rheumatoid arthritis, atherosclerosis) 10 CCL3 Chemokine (C-CCytokines/ AKA: MIP1-alpha; monkine motif) ligand 3 Chemokines/ thatbinds to CCR1, CCR4 Growth Factors and CCR5; major HIV- suppressivefactor produced by CD8 cells. 11 CCL4 Chemokine (C-C Cytokines/Inflammatory and Motif) ligand 4 Chemokines/ chemotactic monokine;Growth Factors binds to CCR5 and CCR8 12 CCL5 Chemokine (C-C Cytokines/Binds to CCR1, CCR3, and Motif) ligand 5 Chemokines/ CCR5 and is aGrowth Factors chemoattractant for blood monocytes, memory T- helpercells and eosinophils; A major HIV-suppressive factor produced by CD8-positive T-cells 13 CCR3 Chemokine (C-C Chemokine receptor C-C typechemokine motif) receptor 3 receptor (Eotaxin receptor) binds toEotaxin, Eotaxin-3, MCP-3, MCP-4, SCYA5/RANTES and mip-1 delta therebymediating intracellular calcium flux. Alternative co-receptor with CD4for HIV-1 infection. Involved in recruitment of eosinophils. Primarily aTh2 cell chemokine receptor. 14 CD14 CD14 antigen Cell Marker LPSreceptor used as marker for monocytes 15 CD19 CD19 antigen Cell MarkerAKA Leu 12; B cell growth factor 16 CD3Z CD3 antigen, zeta Cell MarkerT-cell surface glycoprotein polypeptide 17 CD4 CD4 antigen (p55) CellMarker Helper T-cell marker 18 CD86 CD 86 Antigen (cD Cell signaling andAKA B7-2; membrane 28 antigen ligand) activation protein found in Blymphocytes and monocytes; co-stimulatory signal necessary for Tlymphocyte proliferation through IL2 production. 19 CD8A CD8 antigen,alpha Cell Marker Suppressor T cell marker polypeptide 20 CKS2 CDC28protein Cell signaling and Essential for function of kinase regulatoryactivation cyclin-dependent kinases subunit 2 21 CSF2 Granulocyte-Cytokines/ AKA GM-CSF; monocyte colony Chemokines/ Hematopoietic growthstimulating factor Growth Factors factor; stimulates growth anddifferentiation of hematopoietic precursor cells from various lineages,including granulocytes, macrophages, eosinophils, and erythrocytes 22CSF3 Colony stimulating Cytokines/ AKA GCSF controls factor 3Chemokines/ production ifferentiation and (granulocyte) Growth Factorsfunction of granulocytes. 23 CSPG2 Chondroitin Sulfate CellAdhesion/Cell Versican is 1 of the main Proteoglycan 2 Recognition genesupregulated after (versican) vascular injury 24 CXCL1 Chemokine (C-X-C-Cytokines/ Melanoma growth motif) ligand 1 Chemokines/ stimulatingactivity, alpha; Growth Factors Chemotactic proinflammatory activation-inducible cytokine. 25 CXCL3 Chemokine Cytokines/ Chemotacticproinflammatory (C-X-C-motif) Chemokines/ activation- ligand 3 GrowthFactors inducible cytokine, acting primarily upon hemopoietic cells inimmunoregulatory processes, may also play a role in inflammation andexert its effects on endothelial cells in an autocrine fashion. 26CXCL10 Chemokine (C-X-C Cytokines/ AKA: Gamma IP10; motif) ligand 10Chemokines/ interferon inducible cytokine Growth Factors IP10; SCYB10;Ligand for CXCR3; binding causes stimulation of monocytes, NK cells;induces T cell migration 27 DPP4 Dipeptidyl-peptidase 4 Membraneprotein; Removes dipeptides from exopeptidase unmodified, n-terminusprolines; has role in T cell activation 28 ELA2 Elastase 2, neutrophilProtease Modifies the functions of NK cells, monocytes and granulocytes29 EGR1 Early Growth Tumor Suppressor The protein encoded by thisResponse 1 gene belongs to the EGR family of C2H2-type zinc- fingerproteins. It is a nuclear protein and functions as a transcriptionalregulator. 30 HIST1H1C Histone 1, H1c Basic nuclear protein Responsiblefor the nucleosome structure within the chromosomal fiber in eukaryotes;may attribute to modification of nitrotyrosine-containing proteins andtheir immunoreactivity to antibodies against nitrotyrosine 31 HLA-DRAmajor Membrane protein; HLA-DRA is one of the histocompatibility antigenprocessing HLA class II alpha chain complex, class II, paralogues. Itplays a central DR alpha role in the immune system by presentingpeptides derived from extracellular proteins 32 HMOX1 Heme oxygenaseEnzyme/Redox Endotoxin inducible (decycling) 1 33 HSPA1A Heat shockprotein Cell Signaling and heat shock protein 70 kDa; 70 activationMolecular chaperone, stabilizes AU rich mRNA 34 ICAM1 Intercellular CellAdhesion/ Endothelial cell surface adhesion molecule 1 Matrix Proteinmolecule; regulates cell adhesion and trafficking, unregulated duringcytokine stimulation 35 IFI16 Gamma interferon Cell signaling andTranscriptional repressor inducible protein 16 activation 36 IFNA2Interferon, alpha 2 Cytokines/ interferon produced by Chemokines/macrophages with antiviral Growth Factors effects 37 IFNG Interferon,Gamma Cytokines/ Pro- and anti-inflammatory Chemokines/ activity; TH1cytokine; Growth Factors nonspecific inflammatory mediator; produced byactivated T-cells. 38 IL10 Interleukin 10 Cytokines/ Anti-inflammatory;TH2; Chemokines/ suppresses production of Growth Factors proinflammatorycytokines 39 IL12B Interleukin 12 p40 Cytokines/ Proinflammatory;mediator Chemokines/ of innate immunity, TH1 Growth Factors cytokine,requires co- stimulation with IL-18 to induce IFN-g 40 IL13 Interleukin13 Cytokines/ Inhibits inflammatory Chemokines/ cytokine productionGrowth Factors 41 IL18 Interleukin 18 Cytokines/ Proinflammatory, TH1,Chemokines/ innate and acquired Growth Factors immunity, promotesapoptosis, requires co- stimulation with IL-1 or IL-2 to induce TH1cytokines in T- and NK-cells 42 IL18RI Interleukin 18 Membrane proteinReceptor for interleukin 18; receptor 1 binding the agonist leads toactivation of NFKB-B; belongs to IL1 family but does not bind IL1A orIL1B. 43 IL1A Interleukin 1, alpha Cytokines- Proinflammatory;chemokines-growth constitutively and inducibly factors expressed invariety of cells. Generally cytosolic and released only during severeinflammatory disease 44 IL1B Interleukin 1, beta Cytokines/Proinflammatory; constitutively Chemokines/ and inducibly expressedGrowth Factors by many cell types, secreted 45 IL1R1 Interleukin 1 Cellsignaling and AKA: CD12 or IL1R1RA; receptor, type I activation Bindsall three forms of interleukin-1 (IL1A, IL1B and IL1RA). Binding ofagonist leads to NFKB activation 46 IL1RN Interleukin 1 Cytokines/ IL1receptor antagonist; Receptor Antagonist Chemokines/ Anti-inflammatory;inhibits Growth Factors binding of IL-1 to IL-1 receptor by binding toreceptor without stimulating IL-1-like activity 47 IL2 Interleukin 2Cytokines/ T-cell growth factor, Chemokines/ expressed by activated T-Growth Factors cells, regulates lymphocyte activation anddifferentiation; inhibits apoptosis, TH1 cytokine 48 IL4 Interleukin 4Cytokines/ Anti-inflammatory; TH2; Chemokines/ suppressesproinflammatory Growth Factors cytokines, increases expression ofIL-1RN, regulates lymphocyte activation 49 IL5 Interleukin 5 Cytokines/Eosinophil stimulatory Chemokines/ factor; stimulates late B cell GrowthFactors differentiation to secretion of Ig 50 IL6 Interleukin 6Cytokines- Pro- and anti-inflammatory (interferon, beta 2)chemokines-growth activity, TH2 cytokine, factors regulateshematopoietic system and activation of innate response 51 IL8Interleukin 8 Cytokines- Proinflammatory, major chemokines-growthsecondary inflammatory factors mediator, cell adhesion, signaltransduction, cell-cell signaling, angiogenesis, synthesized by a widevariety of cell types 52 IRF7 Interferon regulatory Transcription FactorRegulates transcription of factor 7 interferon genes through DNAsequence-specific binding. Diverse roles include virus-mediatedactivation of interferon, and modulation of cell growth,differentiation, apoptosis, and immune system activity. 53 LTALymphotoxin alpha Cytokine Cytokine secreted by (TNF superfamily,lymphocytes and cytotoxic member 1) for a range of tumor cells; activein vitro and in vivo 54 LTB Lymphotoxin beta Cytokine Inducer ofinflammatory (TNFSF3) response and normal lymphoid tissue development 55JUN v-jun avian sarcoma Transcription factor- Proto-oncoprotein; virus17 oncogene DNA binding component of transcription homolog factor AP-1that interacts directly with target DNA sequences to regulate geneexpression 56 MEF2C MADS box Transcription factor- Ttranscriptionactivator transcription DNA binding which binds specifically to enhancerfactor 2, the mef2 element present in polypeptide C the regulatoryregions of (myocyte enhancer many muscle-specific genes factor 2C) 57MIF Macrophage Cell signaling and AKA; GIF; lymphokine, migrationinhibitory growth factor regulators macrophage factor functions throughsuppression of anti- inflammatory effects of glucocorticoids 58 MMP9Matrix Proteinase/ AKA gelatinase B; degrades metalloproteinase 9Proteinase Inhibitor extracellular matrix molecules, secreted by IL-8-stimulated neutrophils 59 N33 tumor suppressor Tumor Suppressor Integralmembrane protein. candidate 3 Associated with homozygous deletion inmetastatic prostate cancer. 60 NFKB1 Nuclear factor of TranscriptionFactor p105 is the precursor of the kappa light p50 subunit of thenuclear polypeptide gene factor NFKB, which binds to enhancer in B-cells1 the kappa-b consensus (p105) sequence located in the enhancer regionof genes involved in immune response and acute phase reactions; theprecursor does not bind DNA itself 61 NFKBIB Nuclear factor ofTranscription Inhibits/regulates NFKB kappa light Regulator complexactivity by trapping polypeptide gene NFKB in the cytoplasm. enhancer inB-cells Phosphorylated serine inhibitor, beta residues mark the NFKBIBprotein for destruction thereby allowing activation of the NFKB complex.62 PF4 Platelet Factor 4 Chemokine PF4 is released during (SCYB4)platelet aggregation and is chemotactic for neutrophils and monocytes.PF4's major physiologic role appears to be neutralization of heparin-like molecules on the endothelial surface of blood vessels, therebyinhibiting local antithrombin III activity and promoting coagulation. 63PI3 Proteinase inhibitor 3 Proteinase inhibitor- aka SKALP; Proteinaseskin derived protein binding inhibitor found in epidermis extracellularmatrix of several inflammatory skin diseases; it's expression can beused as a marker of skin irritancy 64 PLA2G7 Phospholipase A2,Enzyme/Redox Platelet activating factor group VII (platelet activatingfactor acetylhydrolase, plasma) 65 LTA lymphotoxin alpha Cytokines/ LTAmediates a large variety (TNF superfamily, Chemokines/ of inflammatory,member 1) Growth Factors immunostimulatory, and antiviral responses. LTAis also plays a role in apoptosis 66 PTGS2 Prostaglandin- EnzymeCytokine secreted by endoperoxide lymphocytes and cytotoxic synthase 2for a range of tumor cells; active in vitro and in vivo 67 PTX3Pentaxin-related Acute Phase Protein Inducer of inflammatory gene,rapidly induced response and normal by IL-1 beta lymphoid tissuedevelopment 68 RAD52 RAD52 (S. cerevisiae) DNA binding Involved in DNAdouble homolog proteinsor stranded break repair and meiotic/mitoticrecombination 69 SERPINE1 Serine (or cysteine) Proteinase/ Plasminogenactivator protease inhibitor, Proteinase Inhibitor inhibitor-1/PAI-1clade B (ovalbumin), member 1 70 SLC7A1 Solute carrier family Membraneprotein; High affinity, low capacity 7, member 1 permease permeaseinvovled in the transport of positively charged amino acids 71 STAT3Signal transduction Transcription factor AKA APRF: Transcription andactivator of factor for acute phase transcription 3 response genes;rapidly activated in response to certain cytokines and growth factors;binds to IL6 response elements 72 TGFB1 Transforming growth Cytokines/Pro- and antiinflammatory factor, beta 1 Chemokines/ activity,anti-apoptotic; cell- Growth Factors cell signaling, can either inhibitor stimulate cell growth 73 TGFBR2 Transforming growth Membrane proteinAKA: TGFR2; membrane factor, beta receptor protein involved in cell IIsignaling and activation, ser/thr protease; binds to DAXX. 74 TIMP1Tissue inhibitor of Proteinase/ Irreversibly binds and metalloproteinase1 Proteinase Inhibitor inhibits metalloproteinases, such as collagenase75 TLR2 Toll-like receptor 2 Cell signaling and mediator of petidoglycanand activation lipotechoic acid induced signalling 76 TNF Tumor necrosisCytokine/tumor Negative regulation of factor necrosis factor insulinaction. Produced in receptor ligand excess by adipose tissue of obeseindividuals - increases IRS-1 phosphorylation and decreases insulinreceptor kinase activity. 77 TNFRSF7 Tumor necrosis Membrane protein;Receptor for CD27L; may factor receptor receptor play a role inactivation of T superfamily, member 7 cells 78 TNFSF13B Tumor necrosisCytokines/ B cell activating factor, TNF factor (ligand) Chemokines/family superfamily, member Growth Factors 13b 79 TNFRSF13B Tumornecrosis Cytokines/ B cell activating factor, TNF factor receptorChemokines/ family superfamily, member Growth Factors 13, subunit beta80 TNFSF5 Tumor necrosis Cytokines/ Ligand for CD40; expressed factor(ligand) Chemokines/ on the surface of T cells. It superfamily, member 5Growth Factors regulates B cell function by engaging CD40 on the B cellsurface. 81 TNFSF6 Tumor necrosis Cytokines/ AKA FasL; Ligand for FASfactor (ligand) Chemokines/ antigen; transduces apoptotic superfamily,member 6 Growth Factors signals into cells

TABLE 2 Inflammation Gene Expression Panel Symbol Name ClassificationDescription 1 ADAM 17 a disintegrin and Membrane Tumor necrosisfactor-alpha metalloproteinase protein, cell converting enzyme domain 17signaling (tumor necrosis factor, alpha, converting enzyme) 2 ALOX5arachidonate 5- Inflammatoy Synthesizes leukotrienes from lipoxygenaseResponse arachidonic acid; member of lipoxygenase gene family 3 ANXA11annexin A11 Immune 56-kD antigen recognized by response, sera frompatients with Calcium ion various autoimmune diseases; binding member ofannexin family (calcium-dependent phospholipid-binding proteins) 4 APAF1apoptotic Protease Cytochrome c binds to Protease activating APAF1,triggering activation Activating Factor 1 peptide of CASP3, leading toapoptosis. May also facilitate procaspase 9 autoactivation. 5 BAXBCL2-associated Cell cycle Forms a heterodimer with X proteinregulation, BCL2 and functions as an apoptosis apoptotic activator;protein is induction reported to interact with, and increase the openingof, the mitochondrial voltage- dependent anion channel (VDAC), whichleads to the loss in membrane potential and the release of cytochrome c;member of BCL2 protein family 6 C1QA Complement Proteinase/ encodes theA-chain component 1, q Proteinase polypeptide of human subcomponent,Inhibitor complement subcomponent alpha polypeptide C1q 7 CASP1 caspase1, Proteinase Proteolytically cleaves and apoptosis-related activatesthe inactive cysteine precursor of interleukin-1; peptidase induces cellapoptosis; (interleukin 1, member of the cysteine- beta, convertase)aspartic acid protease (caspase) family 8 CASP3 caspase 3, ProteinaseCleaves and activates apoptosis-related caspases 6, 7 and 9. It is thecysteine predominant caspase involved peptidase in the cleavage ofamyloid- beta 4A precursor protein; member of the cysteine- asparticacid protease (caspase) family 9 CCL2 chemokine (C-C Cytokines/ Displayschemotactic activity motif) ligand 2 Chemokines/ for monocytes andbasophils Growth Factors but not for neutrophils or eosinophils; bindsto chemokine receptors CCR2 and CCR4, member of cytokine family(involved in immunoregulatory and inflammatory processes) 10 CCL3chemokine (C-C Cytokines/ Monokine involved in the motif) ligand 3Chemokines/ acute inflammatory state in Growth Factors the recruitmentand activation of polymorphonuclear leukocytes 11 CCL5 chemokine (C-CCytokines/ Chemoattractant for blood motif) ligand 5 Chemokines/monocytes, memory T helper Growth Factors cells and eosinophils; causesthe release of histamine from basophils and activates eosinophils; oneof the major HIV-suppressive factors produced by CD8+ cells; functionsas one of the natural ligands for the chemokine receptor CCR5 and itsuppresses in vitro replication of the R5 strains of HIV-1, which useCCR5 as a coreceptor 12 CCR3 chemokine (C-C Chemokine Binds and respondsto a motif) receptor 3 receptor variety of chemokines, including eotaxin(CCL11), eotaxin-3 (CCL26), MCP-3 (CCL7), MCP-4 (CCL13), and RANTES(CCL5); highly expressed in eosinophils and basophils, and detected inTH1&TH2 cells and airway epithelial cells; may contribute to theaccumulation and activation of eosinophils and other inflammatory cellsin the allergic airway; also known to be an entry co-receptor forHILV-1; member of family 1 of the G protein-coupled receptors 13 CCR5chemokine (C-C Chemokine Expressed by T cells and motif) receptor 5receptor macrophages - important co- receptor for macrophage- tropicvirus, including HIV, to enter host cells; expression also detected in apromyeloblastic cell line, suggesting its role in granulocyte lineageproliferation and differentiation; member of the beta chemokine receptorfamily 14 CRP C-reactive Inflammatory Promotes agglutination, protein,Response, bacterial capsular swelling, pentraxin-related acute phasephagocytosi and complement protein fixation through its calcium-dependent binding to phosphorylcholine; can interact with DNA andhistones and may scavenge nuclear material released from damagedcirculating cells 15 CTLA4 cytotoxic T- Membrane Costimulatory moleculelymphocyte- protein, expressed by activated T associated protein 4Immune cells; binds to B7-1 (CD80; response MIM 112203) and B7-2 (CD86;MIM 601020) on antigen-presenting cells and transmits an inhibitorysignal to T cells; member of the immunoglobulin superfamily 16 CXCL10chemokine (C-X-C cytokines- Ligand for CXCR3; binding moif) ligand 10chemokines- causes stimulation of growth factors monocytes, NK cells;induces T cell migration 17 CXCL3 chemokine (C-X-C cytokines-Chemotactic pro- motif) ligand 3 chemokines- inflammatory activation-growth factors inducible cytokine, acting primarily upon hemopoieticcells in immunoregulatory processes 18 CXCL5 chemokine (C-X-C cytokines-Inflammatory chemokine that motif) ligand 5 chemokines- belongs to theCXC growth factors chemokine family; produced concomitantly withinterleukin-8 (IL8) in response to stimulation with either interleukin-1(IL1) or tumor necrosis factor-alpha (TNFA); involved in neutrophilactivation 19 CXCR3 chemokine (C-X-C Chemokine Binding of chemokines tomotif) receptor 3 receptor CXCR3 induces cellular responses that areinvolved in leukocyte traffic, most notably integrin activation,cytoskeletal changes and chemotactic migration; may participate in therecruitment of inflammatory cells 20 DPP4 Dipeptidylpeptidase 4 MembraneRemoves dipeptides from protein; unmodified, n-terminus exopeptidaseprolines; has role in T cell activation 21 EGR1 early growth TumorDisplays FOS-like induction response-1 Suppressor kinetics infibroblasts, epithelial cells, and lymphocytes, following mitogenicstimulation; coordinated regulation of TGFB1 and fibronectin 22 ELA2elastase 2, Protease Modifies the functions of NK neutrophil cells,moncytes and granulocytes 23 FAIM3 Fas apoptotic Cellualr Novelregulator of Fas- inhibitory defense, mediated apoptosis; regulatormolecule 3 apoptosis of cell fate in T cells and inhibitor otherhematopoietic lineages 24 GCLC glutamate- Enzyme- First rate limitingenzyme of cysteine ligase, cysteine, glutathione synthesis; catalyticsubunit glutamate deficiency of gamma- metabolism glutamylcysteinesynthetase in humans is associated with enzymopathic hemolytic anemia 25GZMB granzyme B Apoptosis, Crucial for the rapid (granzyme 2, Cytolysisinduction of target cell cytotoxic T- apoptosis by CTL in cell-lymphocyte- mediated immune response associated serine esterase 1) 26HLA-DRA major Membrane Anchored heterodimeric histocompatibilityprotein; antigen molecule; cell-surface antigen complex, classprocessing presenting complex II, DR alpha 27 HMGB1 high-mobility DNArepair, Binds with high affinity to group box 1 Signal specific DNAstructures such transduction as bent or kinked DNA; considered to be astructural protein of chromatin 28 ICOS inducible T-cell Immune Playsrole in cell-cell co-stimulator response signaling, immune responses,and regulation of cell proliferation; member of CD28 and CTLA-4 cell-surface receptor family 29 IFI16 interferon Cell signalingTranscriptional repressor inducible protein and activation 16, gamma 30IRF1 interferon Transcription Activator of interferons alpha regulatoryfactor 1 factor and beta transcription; transcription activator of genesinduced by interferons alpha, beta, and gamma; regulates apoptosis andtumor-suppressoion; member of interferon regulatory transcription factor(IRF) family 31 IL1R1 interleukin 1 Cell signaling Receptor forinterleukin alpha receptor, type I and activation (IL1A), interleukinbeta (IL1B), and interleukin 1 receptor, type I(IL1R1/IL1RA); mediatorin cytokine induced immune and inflammatory responses; member ofinterleukin 1 receptor family 32 IL23A interleukin 23, cytokines-Activate the transcription alpha subunit p19 chemokines- activatorSTAT4, and growth factors stimulate the production of interferon-gamma(IFNG); acts on memory CD4(+) T cells 33 IL32 interleukin 32 cytokines-Induces the production of chemokines- TNFalpha from macrophage growthfactors cells; member of cytokine family 34 LTA lymphotoxin CytokineCytokine secreted by alpha (TNF lymphocytes and cytotoxic superfamily,for a range of tumor cells; member 1) active in vitro and in vivo 35MAP3K1 mitogen- Protein Integrates cellular responses activated proteinserine/threonine to a number of mitogenic and kinase kinase kinasemetabolic stimuli, including kinase 1 insulin and many growth factors 36MAPK14 mitogen- Protein Binds to TRAF2 and activated proteinserine/threonine stimulates NF-kappaB kinase 14 kinase activity 37MHC2TA class II, major Transcription AKA CIITA; Positivehistocompatibility factor, Immune regulator of class II major complex,response histocompatibility complex transactivator gene transcription inthe nucleus 38 MIF macrophage Cell signaling AKA; GIF; lymphokine,migration and growth regulators macrophage inhibitory factor factorfunctions through suppression (glycosylation- of anti-inflammatoryeffects inhibiting factor) of glucocorticoids 39 MMP12 matrixProteinase/ Involved in the breakdown of metallopeptidase Proteinaseextracellular matrix in normal 12 (macrophage Inhibitor physiologicalprocesses and elastase) in disease processes - specifically thedegredation of soluble and insoluble elastin 40 MMP8 matrix Proteinase/Involved in the breakdown of metallopeptidase Proteinase extracellularmatrix in normal 8 (neutrophil Inhibitor physiological processes andcollagenase) in disease processes - specifically the degradation of typeI, II and III collagens 41 MNDA myeloid cell Transcription Detected onlyin nuclei of nuclear factor, Cellular cellls of the granulocyte-differentiation defense respnse monocyte lineage; antigen participatesin blood cell- specific responses to interferons 42 MPO myeloperoxidaseEnzyme, Part of the host defense Apoptosis system of human inhibitorpolymorphonuclear leukocytes, responsible for microbicidal activityagainst a wide range of organisms 43 MYC v-myc Transcription Promotescell proliferation myelocytomatosis factor, Cell and transformation byviral oncogene proliferation activating growth-promoting homolog (avian)genes; activates of telomerase; activates transcription as part of aheteromeric complex with MAX 44 NFKB1 nuclear factor of TranscriptionEncodes a 105 kD protein kappa light factor which can undergopolypeptide gene cotranslational processing by enhancer in B- the 26Sproteasome to cells 1 (p105) produce a 50 kD protein. The 105 kD proteinis a Rel protein-specific transcription inhibitor and the 50 kD proteinis a DNA binding subunit of the NF-kappa-B (NFKB) protein complex;activated NFKB translocates into the nucleus and stimulates theexpression of genes involved in a wide variety of biological functions45 PLA2G2A phospholipase Enzyme-lipid Regulates phospholipid A2, groupIIA catabolism metabolism in biomembranes, (platelets, includingeicosanoid synovial fluid) biosynthesis; catalyzes the calcium-dependenthydrolysis of the 2-acyl groups in 3-sn- phosphoglycerides. 46 PLAURplasminogen Signal Localizes and promoes activator, transduction,plasmin formation; binds urokinase chemotaxis urokinase plasminogenreceptor activator and permits the activation of the receptor- boundpro-enzyme by plasmin 47 PRTN3 proteinase 3 Regulation of Cleaveselastin; key protease (serine cell for factor-independent growthproteinase, proliferation, of hematopoietic cells neutrophil, collagenWegener catabolism granulomatosis autoantigen) 48 PTX3 pentraxin-relatedAcute Phase Novel marker of gene, rapidly Protein inflammatoryreactions; IL1b- induced by IL-1 TNF inducible protein found beta inendothelial cells 49 SERPINA3 serine (or Acute phase Plasma proteaseinhibitor and cysteine) response, member of the serine proteaseproteinase Inflammatory inhibitor class; tissue specific inhibitor,clade A response polymorphisms that influence (alpha-1 proteasetargeting antiproteinase, antitrypsin), member 1 50 SSI-3 suppressor ofProtein kinase Cytokine-inducible negative cytokine inhibitor,regulators of cytokine signaling 3 apoptosis signaling; member of theinhibitor, signal STAT-induced STAT transduction inhibitor (SSI) family,also known as suppressor of cytokine signaling (SOCS) family 51 TLR2toll-like receptor 2 Cell Signaling Mediator of petidoglycan and andActivation lipotechoic acid induced signaling 52 TLR4 toll-like receptor4 Cell Signaling Member of the Toll-like and Activation receptor (TLR)family which plays a fundamental role in pathogen recognition andactivation of innate immunity; recognizes pathogen-associated molecularpatterns (PAMPs) that are expressed on infectious agents, and mediatesthe production of cytokines necessary for the development of effectiveimmunity 53 TNFRSF17 tumor necrosis Cytokines/ Important for B cellfactor receptor Chemokines/ development and autoimmune superfamily,Growth Factors response; specifically binds to member 17 tumor necrosisfactor (ligand) superfamily, member 13b (TNFSF13B/TALL-1/BAFF); leads toNF-kappaB and MAPK8/JNK activation; binds to various TRAF familymembers - may transduce signals for cell survival and proliferation;member of the TNF-receptor superfamily 54 TNFRSF1A tumor necrosisCytokines/ Major receptor for the tumor factor receptor Chemokines/necrosis factor-alpha - superfamily, Growth Factors activates NF-kappaB,member 1A mediates apoptosis, and functions as a regulator ofinflammation; member of the TNF-receptor superfamily 55 TXNRD1thioredoxin Enzyme/ Reduces thioredoxins as well reductase Redox, Signalas other substrates; involved transduction in selenium metabolism andprotection against oxidative stress; member of a family of pyridinenucleotide oxidoreductases 56 IL1A Interleukin 1, Cytokines/Proinflammatory; alpha Chemokines/ constitutively and inducibly GrowthFactors expressed in variety of cells. Generally cytosolic and releasedonly during severe inflammatory disease 57 IL1B Interleukin 1,Cytokines/ Proinflammatory; constitutively beta Chemokines/ andinducibly expressed by Growth Factors many cell types, secreted 58 TNFTumor necrosis Cytokines/ Proinflammatory, TH1, factor, alphaChemokines/ mediates host response to Growth Factors bacterial stimulus,regulates cell growth & differentiation 59 IL6 Interleukin 6 Cytokines/Pro- and antiinflammatory (interferon, beta Chemokines/ activity, TH2cytokine, 2) Growth Factors regulates hemotopoietic system andactivation of innate response 60 IL8 Interleukin 8 Cytokines/Proinflammatory, major Chemokines/ secondary inflammatory Growth Factorsmediator, cell adhesion, signal transduction, cell-cell signaling,angiogenesis, synthesized by a wide variety of cell types 61 IFNGInterferon gamma Cytokines/ Pro- and antiinflammatory Chemokines/activity, TH1 cytokine, Growth Factors nonspecific inflammatorymediator, produced by activated T-cells 62 IL2 Interleukin 2 Cytokines/T-cell growth factor, Chemokines/ expressed by activated T GrowthFactors cells, regulates lymphocyte activation and differentiation;inhibits apoptosis, TH1 cytokine 63 IL12B Interleukin 12 Cytokines/Proinflammatory; mediator of p40 Chemokines/ innate immunity, TH1 GrowthFactors cytokine, requires co- stimulation with IL-18 to induce IFN-g 64IL15 Interleukin 15 Cytokines/ Proinflammatory; mediates T- Chemokines/cell activation, inhibits Growth Factors apoptosis, synergizes with IL-2 to induce IFN-g and TNF-a 65 IL18 Interleukin 18 Cytokines/Proinflammatory, TH1, innate Chemokines/ and aquired immunity, GrowthFactors promotes apoptosis, requires co-stimulation with IL-1 or IL-2 toinduce TH1 cytokines in T- and NK-cells 66 IL4 Interleukin 4 Cytokines/Antiinflammatory; TH2; Chemokines/ suppresses proinflammatory GrowthFactors cytokines, increases expression of IL-1RN, regulates lymphocyteactivation 67 IL5 Interleukin 5 Cytokines/ Eosinophil stimulatoryfactor; Chemokines/ stimulates late B cell Growth Factorsdifferentiation to secretion of Ig 68 IL10 Interleukin 10 Cytokines/Antiinflammatory; TH2; Chemokines/ suppresses production of GrowthFactors proinflammatory cytokines 69 IL13 Interleukin 13 Cytokines/Inhibits inflammatory Chemokines/ cytokine production Growth Factors 70IL1RN Interleukin 1 Cytokines/ IL1 receptor antagonist; receptorChemokines/ Antiinflammatory; inhibits antagonist Growth Factors bindingof IL-1 to IL-1 receptor by binding to receptor without stimulatingIL-1-like activity 71 IL18BP IL-18 Binding Cytokines/ Implicated ininhibition of Protein Chemokines/ early TH1 cytokine responses GrowthFactors 72 TGFB1 Transforming Cytokines/ Pro- and antiinflammatorygrowth factor, Chemokines/ activity, anti-apoptotic; cell- beta 1 GrowthFactors cell signaling, can either inhibit or stimulate cell growth 73IFNA2 Interferon, alpha 2 Cytokines/ interferon produced by Chemokines/macrophages with antiviral Growth Factors effects 74 CXCL1 Chemokine(C-X-C Cytokines/ Chemotactic for neutrophils, motif) Chemokines/ alsoplay a fundamental roles Ligand 1 Growth Factors in the development,(melanoma homeostasis, and function of growth the immune systemstimulating activity, alpha) 75 CXCL2 Chemokine (C-X-C Cytokines/ AKAMIP2, SCYB2; motif) Chemokines/ Macrophage inflammatory Ligand 2 GrowthFactors protein produced by moncytes and neutrophils 76 TNFSF5 Tumornecrosis Cytokines/ ligand for CD40; expressed factor (ligand)Chemokines/ on the surface of T cells. It superfamily, Growth Factorsregulates B cell function by member 5 engaging CD40 on the B cellsurface 77 TNFSF6 Tumor necrosis Cytokines/ AKA FasL; Ligand for FASfactor (ligand) Chemokines/ antigen; transduces apoptotic superfamily,Growth Factors signals into cells member 6 78 CSF3 Colony Cytokines/ AKAGCSF; cytokine that stimulating factor Chemokines/ stimulatesgranulocyte 3 (granulocyte) Growth Factors development 79 CD86 CD86molecule Cell signaling This gene encodes a type I and activationmembrane protein that is a member of the immunoglobulin superfamily.This protein is expressed by antigen-presenting cells, and it is theligand for two proteins at the cell surface of T cells, CD28 antigen andcytotoxic T-lymphocyte- associated protein 4 80 CSF2 Granulocyte-Cytokines/ AKA GM-CSF; monocyte colony Chemokines/ Hematopoietic growthfactor; stimulating factor Growth Factors stimulates growth anddifferentiation of hematopoietic precursor cells from various lineages,including granulocytes, macrophages, eosinophils, and erythrocytes 81TNFSF13B Tumor necrosis Cytokines/ B cell activating factor, TNF factor(ligand) Chemokines/ family superfamily, Growth Factors member 13b 82TNFRSF13B tumor necrosis Cytokines/ The protein induces factor receptorChemokines/ activation of the transcription superfamily, Growth Factorsfactors NFAT, AP1, and NF- member 13B kappa-B and plays a crucial rolein humoral immunity by interacting with a TNF ligand 83 VEGF vascularCytokines/ Producted by monocytes endothelial Chemokines/ growth factorGrowth Factors 84 ICAM1 Intercellular Cell Adhesion/ Endothelial cellsurface adhesion Matrix Protein molecule; regulates cell molecule 1adhesion and trafficking, upregulated during cytokine stimulation 85PTGS2 Prostaglandin- Enzyme/ AKA COX2; endoperoxide RedoxProinflammatory, member of synthase 2 arachidonic acid to prostanoidconversion pathway; induced by proinflammatory cytokines 86 NOS2A Nitricoxide Enzyme/ AKA iNOS; produces NO synthase 2A Redox which isbacteriocidal/tumoricidal 87 PLA2G7 Phospholipase Enzyme/ Plateletactivating factor A2, group VII Redox (platelet activating factoracetylhydrolase, plasma) 88 HMOX1 Heme oxygenase Enzyme/ Endotoxininducible (decycling) 1 Redox 89 F3 F3 Enzyme/ AKA thromboplastin, RedoxCoagulation Factor 3; cell surface glycoprotein responsible forcoagulation catalysis 90 CD3Z CD3 antigen, zeta Cell Marker T-cellsurface glycoprotein polypeptide 91 PTPRC Protein tyrosine Cell MarkerAKA CD45; mediates T-cell phosphatase, activation receptor type, C 92CD14 CD14 antigen Cell Marker LPS receptor used as marker for monocytes93 CD4 CD4 antigen Cell Marker Helper T-cell marker (p55) 94 CD8A CD8antigen, Cell Marker Suppressor T cell marker alpha polypeptide 95 CD19CD19 antigen Cell Marker AKA Leu 12; B cell growth factor 96 HSPA1A Heatshock Cell Signaling heat shock protein 70 kDa protein 70 and activation97 MMP3 Matrix Proteinase/ AKA stromelysin; degrades metalloproteinase 3Proteinase fibronectin, laminin and Inhibitor gelatin 98 MMP9 MatrixProteinase/ AKA gelatinase B; degrades metalloproteinase 9 Proteinaseextracellular matrix Inhibitor molecules, secreted by IL-8- stimulatedneutrophils 99 PLAU Plasminogen Proteinase/ AKA uPA; cleaves activator,Proteinase plasminogen to plasmin (a urokinase Inhibitor proteaseresponsible for nonspecific extracellular matrix degradation) 100SERPINE1 Serine (or Proteinase/ Plasminogen activator cysteine) proteaseProteinase inhibitor-1/PAI-1 inhibitor, clade B Inhibitor (ovalbumin),member 1 101 TIMP1 Tissue inhibitor Proteinase/ Irreversibly binds andinhibits of Proteinase metalloproteinases, such as metalloproteinase 1Inhibitor collagenase 102 C1QA Complement Proteinase/ Serum complementsystem; component 1, q Proteinase forms C1 complex with thesubcomponent, Inhibitor proenzymes c1r and c1s alpha polypeptide 103HLA-DRB1 Major Histocompatibility Binds antigen for histocompatibilitypresentation to CD4+ cells complex, class II, DR beta 1

TABLE 3 Ranking of genes from most to least significant Normals RAsTests for significance 0 1 1-way ANOVA Logit Model gene id# gene N MeanStd. Dev. N Mean Std. Dev. F p-value p-value 45 TLR2 133 16.1 0.6 2214.6 0.6 99.3 2.5E−18 1.0E−16 31 MMP9 131 16.0 1.3 22 13.5 1.2 70.33.3E−14 1.5E−14 20 IFI16 133 16.8 0.8 22 15.3 0.7 65.1 1.9E−13 2.4E−1443 TGFB1 133 13.2 0.6 22 12.3 0.5 42.3 1.0E−09 1.3E−12 35 NFKB1 134 17.40.7 22 16.5 0.4 41.0 1.8E−09 8.1E−12 44 TIMP1 134 15.0 0.6 22 14.0 0.658.8 1.8E−12 1.5E−11 26 IL1R1 133 21.1 1.0 22 19.4 0.9 54.9 8.1E−122.2E−11 42 SERPING1 133 19.2 1.2 21 17.0 1.4 60.8 9.3E−13 3.9E−11 40SERPINA1 134 13.3 0.8 20 12.2 0.5 35.4 1.8E−08 4.4E−10 13 EGR1 133 20.40.6 22 19.6 0.5 42.4 1.0E−09 1.0E−09 34 MYC 133 17.3 0.7 22 16.3 0.634.3 2.8E−08 1.3E−09 27 IL1RN 132 16.9 0.7 22 16.0 0.5 28.5 3.3E−071.2E−08 10 CXCL1 134 20.0 0.6 22 19.2 0.5 31.6 8.5E−08 2.9E−08 37 PLAUR134 15.1 0.6 22 14.4 0.4 28.6 3.2E−07 3.5E−08 41 SERPINE1 133 22.3 0.922 21.1 0.8 39.0 4.1E−09 4.7E−08 33 MPO 134 21.1 0.9 22 19.6 1.5 45.13.4E−10 8.1E−08 5 CD14 132 13.9 0.7 20 13.2 0.5 24.5 1.9E−06 2.0E−06 25IL1B 133 16.7 0.8 22 15.9 0.4 23.3 3.3E−06 2.0E−06 2 APAF1 134 16.5 0.522 15.8 0.7 26.0 9.7E−07 2.2E−06 16 HMGB1 133 16.3 0.7 22 17.0 0.6 23.33.3E−06 3.9E−06 6 CD19 133 18.2 0.8 22 19.1 1.0 25.0 1.5E−06 5.0E−06 23IL18 133 20.0 0.6 22 19.3 0.6 23.0 3.8E−06 1.9E−05 11 CYBB 133 14.0 0.622 13.4 0.6 17.4 5.0E−05 4.6E−05 21 IL10 133 22.8 0.6 22 22.1 0.9 18.92.6E−05 8.6E−05 19 ICAM1 134 17.7 0.6 22 17.2 0.4 12.9 4.3E−04 1.1E−04 1ADAM17 131 18.6 0.6 22 18.0 0.8 15.7 1.1E−04 1.2E−04 9 CD8A 133 15.8 0.722 16.5 1.0 17.2 5.4E−05 1.3E−04 39 PTPRC 130 11.9 0.5 21 11.5 0.5 11.49.4E−04 0.001 14 ELA2 131 19.9 1.3 22 18.7 2.0 11.6 8.6E−04 0.001 18HSPA1A 132 13.9 0.9 21 13.3 0.5 8.5 0.004 0.002 24 IL18BP 134 16.8 0.622 17.2 0.7 10.7 0.001 0.003 30 LTA 115 20.1 0.6 19 19.7 0.8 7.6 0.0070.007 48 TNFSF6 130 20.4 0.7 21 20.9 0.9 7.1 0.009 0.010 46 TNF 132 20.70.9 22 20.2 0.5 5.5 0.021 0.012 32 MNDA 133 12.6 0.7 22 12.2 0.6 5.50.020 0.017 3 C1QA 133 20.2 1.0 22 20.7 0.8 5.3 0.023 0.018 17 HMOX1 13416.5 0.7 22 16.1 0.7 5.3 0.023 0.019 7 CD4 134 14.8 0.5 22 15.1 0.7 4.70.032 0.035 15 GCLC 131 18.9 0.6 21 18.7 0.7 1.9 0.165 0.170 8 CD86 13117.6 0.5 21 17.8 0.7 1.8 0.185 0.180 28 IL6 134 23.5 0.3 22 23.5 0.3 0.40.513 0.510 36 PLA2G7 129 19.3 0.6 22 19.2 1.0 0.4 0.514 0.510 12 DPP4134 18.4 0.6 22 18.4 0.7 0.2 0.642 0.640 38 PTGS2 128 16.7 0.6 21 16.70.4 0.2 0.642 0.640 47 TNFSF5 134 17.7 0.6 21 17.6 0.8 0.2 0.657 0.65029 IL8 116 21.0 1.4 21 21.0 1.4 0.0 0.880 0.880 22 IL15 133 21.5 0.6 2221.4 0.8 0.0 0.967 0.970

TABLE 4 Latent class modeling-ranking of p-values form most to leastsignificant 1-gene model estimating RA v. Normal discrimination (RA, N =22, Normal N = 134) gene 1 p-value R-SQ TLR2 1.00E−16 0.535 MMP91.50E−14 0.504 IFI16 2.40E−14 0.457 TGFB1 1.30E−12 0.377 NFKB1 8.10E−120.312 TIMP1 1.50E−11 0.370 IL1R1 2.20E−11 SERPING1 3.90E−11 SERPINA14.40E−10 EGR1 1.00E−09 MYC 1.30E−09 IL1RN 1.20E−08 CXCL1 2.90E−08 PLAUR3.50E−08 SERPINE1 4.70E−08 MPO 8.10E−08 CD14 2.00E−06 IL1B 2.00E−06APAF1 2.20E−06 HMGB1 3.90E−06 CD19 5.00E−06 IL18 1.90E−05 CYBB 4.60E−05IL10 8.60E−05 ICAM1 0.00011 ADAM17 0.00012 CD8A 0.00013 PTPRC 0.001 ELA20.0012 HSPA1A 0.0019 IL18BP 0.0031 LTA 0.0066 TNFSF6 0.01 TNF 0.012 MNDA0.017 C1QA 0.018 HMOX1 0.019 CD4 0.035 GCLC 0.17 CD86 0.18 IL6 0.51PLA2G7 0.51 PTGS2 0.64 DPP4 0.64 TNFSF5 0.65 IL8 0.88 IL15 0.97

TABLE 5 Latent class modeling-estimating RA v. Normal discriminationusing a 2-gene model (RA, N = 22, Normal N = 134) Gene 2 Gene 1Incremental P- Incremental P- gene 1 gene 2 Value % RA % normal R-SQValue TLR2 CD4 0.00055 91% 98% 0.758 1.2E−04 TLR2 PTGS2 0.00100 77% 99%0.704 8.5E−06 TLR2 IL18BP 0.00019 77% 99% 0.680 1.8E−05 TLR2 HSPA1A0.00160 82% 99% 0.685 2.9E−05 TLR2 HMGB1 0.00140 77% 99% 0.706 3.2E−06TLR2 C1QA 0.00077 100% 96% 0.630 4.6E−06 TLR2 MNDA 0.00180 82% 99% 0.6738.1E−07 TLR2 CD19 0.00500 86% 96% 0.636 2.2E−05 TLR2 CD86 0.00850 77%99% 0.649 5.4E−06 TLR2 SERPING1 0.00570 73% 99% 0.623 2.4E−05 TLR2 CD8A0.00740 73% 99% 0.623 2.1E−06 TLR2 PTPRC 0.01500 77% 97% 0.601 4.0E−06TLR2 MYC 0.01400 77% 98% 0.603 2.0E−05 MMP9 SERPING1 0.00025 82% 97%0.632 0.00001 MMP9 PTGS2 0.00062 82% 98% 0.648 5.5E−07 MMP9 IFI160.00075 77% 97% 0.602 0.00035 MMP9 HSPA1A 0.00130 73% 97% 0.615 4.5E−07IFI16 HMGB1 0.00017 86% 97% 0.650 3.90E−06  IFI16 SERPINE1 0.00013 77%99% 0.628 5.50E−06  IFI16 CD19 0.00053 77% 98% 0.618 4.10E−06  TGFB1 CD45.30E−05 91% 97% 0.728 3.80E−06  TGFB1 IL18BP 0.00011 68% 99% 0.6282.10E−06  TGFB1 PTGS2 9.80E−05 73% 99% 0.621 1.20E−06  NFKB1 CD4 0.0002482% 100% 0.789 2.60E−05  NFKB1 IL18BP 0.00015 82% 99% 0.746 2.20E−05 TIMP1 CD4 7.30E−05 82% 97% 0.679 5.00E−06 

TABLE 6 Latent class modeling-estimating RA v. Normal discrimination ina dataset using a 3-gene model (RA, N = 22, Normal N = 134) LG AnalysisGene 3 Inc. p- Gene 1 Gene 2 value Latent gene 1 gene 2 gene 3 % RA %normal R-SQ Inc. P-Value Inc. P-Value Gold Gene 3 Inc. P- Value TLR2 CD4NFKB1 0.0230 100% 100% 0.993 0.0320 0.0110 TLR2 CD4 MYC 0.0370 96% 99%0.933 0.0110 0.0120 TLR2 CD4 TNFSF5 0.0170 96% 99% 0.891 0.0020 0.0045TLR2 CD4 LTA 0.0110 91% 100% 0.860 0.0024 0.0028 TLR2 CD4 TGFB1 0.007991% 99% 0.846 0.0210 7.6E−04 TLR2 CD4 C1QA 0.0140 96% 99% 0.831 1.7E−040.0021 TLR2 CD4 DPP4 0.0220 91% 99% 0.821 9.3E−04 0.0020 TLR2 CD4 HMGB10.0180 82% 100% 0.817 0.0010 0.0050 TLR2 CD4 EGR1 0.0020 82% 100% 0.8120.0042 0.0098 TLR2 CD4 IL1R1 0.0370 86% 99% 0.786 0.0010 0.0065 TLR2 CD4SERPINE1 0.0370 77% 100% 0.798 9.3E−04 0.0012 TLR2 PTGS2 NFKB1 0.008891% 99% 0.830 8.6E−04 0.0230 TLR2 PTGS2 IL1R1 0.0210 86% 99% 0.7712.0E−04 0.0010 TLR2 PTGS2 C1QA 0.0045 91% 99% 0.835 6.2E−04 0.0059 TLR2PTGS2 MYC 0.0110 82% 100% 0.788 2.3E−04 0.0017 TLR2 PTGS2 TGFB1 0.011091% 99% 0.814 0.0036 0.0015 TLR2 PTGS2 ICAM1 0.0360 86% 100% 0.7415.3E−05 4.7E−04 TLR2 PTGS2 HMGB1 0.0130 86% 99% 0.778 7.1E−05 0.0085TLR2 PTGS2 IL1RN 0.0250 77% 99% 0.770 6.5E−05 9.3E−04 TLR2 PTGS2SERPINE1 0.0450 82% 99% 0.739 9.7E−05 0.0020 TLR2 PTGS2 LTA 0.0420 91%98% 0.760 4.6E−05 8.6E−04 TLR2 PTGS2 TIMP1 0.0320 77% 99% 0.749 4.0E−040.0013 TLR2 PTGS2 IFI16 0.0470 73% 100% 0.752 6.0E−04 0.0024 TLR2 PTGS2IL18BP 0.0290 82% 99% 0.749 3.8E−05 0.0140 TLR2 PTGS2 CD19 0.0420 82%99% 0.741 5.3E−05 0.0056 TLR2 IL18BP MYC 0.0058 82% 100% 0.811 0.00240.0012 TLR2 IL18BP NFKB1 0.0062 96% 99% 0.802 0.0250 8.8E−04 TLR2 IL18BPC1QA 0.0058 96% 98% 0.783 5.8E−05 0.0012 TLR2 IL18BP HMGB1 0.0076 86%99% 0.779 1.0E−04 0.0020 TLR2 IL18BP TNFSF5 0.0130 86% 99% 0.754 1.3E−042.0E−04 TLR2 IL18BP HSPA1A 0.0130 82% 99% 0.763 2.1E−04 0.0100 TLR2IL18BP SERPING1 0.0260 82% 99% 0.759 1.9E−04 0.0092 TLR2 IL18BP IL1R10.0250 82% 98% 0.723 5.0E−04 1.9E−04 TLR2 IL18BP LTA 0.0250 77% 99%0.714 1.3E−04 2.1E−04 TLR2 IL18BP IFI16 0.0310 86% 99% 0.759 0.00274.8E−04 TLR2 IL18BP TGFB1 0.0310 77% 99% 0.714 0.0030 5.9E−04 TLR2IL18BP SERPINE1 0.0340 91% 97% 0.711 2.1E−04 4.5E−04 TLR2 IL18BP MNDA0.0360 91% 98% 0.736 5.7E−06 0.0029 TLR2 HSPA1A NFKB1 0.0065 82% 99%0.772 4.9E−04 2.8E−04 TLR2 HSPA1A ICAM1 0.0150 86% 99% 0.760 1.2E−040.0043 TLR2 HSPA1A MMP9 0.0120 82% 99% 0.778 0.0024 5.8E−04 TLR2 HSPA1AC1QA 0.0042 100% 98% 0.790 1.1E−04 0.0057 TLR2 HSPA1A TNFSF6 0.0150 91%98% 0.745 5.7E−05 0.0020 TLR2 HSPA1A IL1R1 0.0066 82% 99% 0.744 1.5E−046.8E−04 TLR2 HSPA1A EGR1 0.0093 96% 99% 0.787 5.6E−04 5.8E−04 TLR2HSPA1A IL1B 0.0150 91% 97% 0.748 1.0E−04 5.3E−04 TLR2 HSPA1A IL1RN0.0220 82% 99% 0.743 1.0E−04 4.3E−04 TLR2 HSPA1A MYC 0.0210 82% 99%0.739 1.6E−04 8.0E−04 TLR2 HSPA1A IFI16 0.0130 96% 97% 0.732 7.1E−040.0030 TLR2 HSPA1A TGFB1 0.0160 82% 99% 0.762 6.1E−04 4.5E−04 TLR2HSPA1A CD19 0.0210 91% 99% 0.780 1.2E−04 0.0045 TLR2 HSPA1A SERPINE10.0100 86% 99% 0.755 2.6E−04 0.0014 TLR2 HSPA1A TIMP1 0.0190 82% 99%0.734 3.1E−04 6.7E−04 TLR2 HSPA1A HMGB1 0.0180 91% 99% 0.783 2.4E−040.0200 TLR2 HSPA1A SERPING1 0.0260 86% 98% 0.722 2.4E−04 0.0060 TLR2HSPA1A MPO 0.0330 77% 99% 0.724 1.2E−04 0.0018 TLR2 HMGB1 C1QA 0.006191% 99% 0.794 2.6E−05 0.0054 TLR2 HMGB1 IFI16 0.0083 77% 99% 0.7410.0130 0.0016 TLR2 HMGB1 TNFSF6 0.0130 86% 99% 0.764 1.8E−05 0.0023 TLR2HMGB1 CD8A 0.0370 86% 99% 0.766 1.2E−05 0.0032 TLR2 HMGB1 GCLC 0.029096% 98% 0.734 1.6E−05 7.5E−04 TLR2 HMGB1 IL1R1 0.0470 96% 96% 0.7306.2E−04 0.0012 TLR2 C1QA CD4 0.0021 96% 99% 0.831 1.7E−04 0.0140 TLR2C1QA HSPA1A 0.0057 100% 98% 0.790 1.1E−04 0.0042 TLR2 C1QA CD19 0.007096% 98% 0.786 4.6E−05 0.0015 TLR2 C1QA HMGB1 0.0054 91% 99% 0.7942.6E−05 0.0061 TLR2 C1QA APAF1 0.0097 86% 98% 0.736 1.8E−04 0.0019 TLR2C1QA CD8A 0.0230 96% 97% 0.715 1.3E−05 0.0025 TLR2 MNDA SERPING1 0.018082% 99% 0.724 4.8E−05 0.0057 TLR2 MNDA CD19 0.0220 91% 99% 0.767 4.8E−060.0093 TLR2 MNDA C1QA 0.0120 86% 98% 0.704 3.9E−06 0.0310 TLR2 MNDANFKB1 0.0270 91% 98% 0.716 8.0E−05 0.0007 TLR2 MNDA SERPINE1 0.0240 86%99% 0.729 6.3E−06 0.0017 TLR2 MNDA IFI16 0.0290 86% 98% 0.688 3.7E−040.0026 TLR2 MNDA CD8A 0.0430 86% 98% 0.706 7.2E−06 0.0079 TLR2 MNDA MYC0.0450 82% 99% 0.709 1.6E−05 0.0045 TLR2 MNDA IL1R1 0.0280 86% 98% 0.6991.8E−05 8.1E−04 TLR2 MNDA MMP9 0.0440 91% 97% 0.707 0.0012 0.0011 TLR2CD19 IFI16 0.0069 86% 99% 0.734 0.0054 0.0020 TLR2 CD19 SERPING1 0.013086% 99% 0.740 0.0019 0.0069 TLR2 CD19 MYC 0.0130 82% 99% 0.724 0.00150.0050 TLR2 CD19 CD86 0.0420 91% 98% 0.707 1.6E−05 0.0280 TLR2 CD19APAF1 0.0400 82% 98% 0.677 1.4E−04 0.0220 TLR2 CD19 NFKB1 0.0360 96% 96%0.673 0.0050 0.0036 TLR2 CD86 MYC 0.0140 86% 99% 0.746 2.1E−04 0.0100TLR2 CD86 NFKB1 0.0210 91% 99% 0.722 0.0010 0.0035 TLR2 CD86 SERPING10.0190 91% 97% 0.697 8.0E−05 0.0260 TLR2 CD86 SERPINE1 0.0150 86% 99%0.712 5.3E−05 0.0043 TLR2 CD86 MPO 0.0390 86% 98% 0.671 6.1E−05 0.0053TLR2 CD86 IFI16 0.0440 77% 98% 0.654 0.0010 0.0160 TLR2 CD86 TIMP10.0290 86% 98% 0.703 0.0010 0.0018 TLR2 SERPING1 C1QA 0.0016 100% 96%0.717 1.3E−04 0.0092 TLR2 SERPING1 IL15 0.0200 82% 97% 0.669 4.8E−050.0084 TLR2 SERPING1 MYC 0.0170 86% 98% 0.689 5.0E−04 0.0084 TLR2SERPING1 APAF1 0.0270 77% 99% 0.664 0.0020 0.0200 TLR2 SERPING1 TNFSF60.0170 77% 99% 0.660 5.3E−05 0.0096 TLR2 SERPING1 SERPINE1 0.0230 82%99% 0.665 4.0E−04 0.0045 TLR2 SERPING1 IL1R1 0.0460 73% 99% 0.638 0.00280.0033 TLR2 SERPING1 PTPRC 0.0400 77% 99% 0.669 4.7E−05 0.0160 TLR2 CD8AMYC 0.0100 91% 98% 0.730 0.0001 0.0061 TLR2 CD8A APAF1 0.0250 82% 98%0.651 0.0002 0.0300 TLR2 CD8A NFKB1 0.0290 73% 99% 0.686 0.0016 0.0028TLR2 PTPRC NFKB1 0.0018 100% 97% 0.755 0.0014 0.0010 TLR2 PTPRC MYC0.0020 82% 99% 0.758 2.4E−05 0.0035 TLR2 PTPRC C1QA 0.0020 100% 97%0.365 1.6E−05 0.0410 TLR2 PTPRC IFI16 0.0160 73% 99% 0.657 5.4E−040.0120 TLR2 PTPRC TNFSF6 0.0220 82% 97% 0.632 7.2E−06 0.0290 TLR2 PTPRCTNFSF5 0.0330 77% 99% 0.669 1.1E−05 0.0028 TLR2 MYC APAF1 0.0062 86% 97%0.683 8.9E−05 0.0140 TLR2 MYC C1QA 0.0022 86% 98% 0.691 1.4E−04 0.0370TLR2 MYC CD8A 0.0061 91% 98% 0.730 1.1E−04 0.0100 TLR2 MYC DPP4 0.014077% 98% 0.676 7.9E−04 0.0020 TLR2 MYC PLA2G7 0.0170 86% 99% 0.6773.8E−04 0.0046 TLR2 MYC IFI16 0.0170 73% 99% 0.635 0.0110 0.0130 TLR2MYC CYBB 0.0270 86% 98% 0.660 1.9E−04 0.0073 TLR2 MYC CD14 0.0280 82%98% 0.660 9.5E−05 0.0096 TLR2 MYC SERPINE1 0.0490 73% 98% 0.622 1.1E−040.0170 MMP9 SERPING1 HSPA1A 0.0011 91% 99% 0.821 2.0E−04 6.0E−04 MMP9SERPING1 PTGS2 0.0029 96% 98% 0.789 7.4E−05 8.2E−04 MMP9 SERPING1 CD40.0026 77% 100% 0.757 3.6E−05 6.9E−04 MMP9 SERPING1 C1QA 0.0069 100% 97%0.713 7.5E−05 2.9E−04 MMP9 SERPING1 MNDA 0.0066 91% 98% 0.713 1.1E−052.4E−04 MMP9 SERPING1 IL18BP 0.0081 86% 99% 0.721 7.7E−05 2.8E−04 MMP9SERPING1 IL1R1 0.0200 73% 99% 0.665 0.0010 3.4E−04 MMP9 SERPING1 MYC0.0280 86% 98% 0.676 2.2E−04 7.5E−04 MMP9 SERPING1 APAF1 0.0340 82% 98%0.679 8.2E−05 6.1E−04 MMP9 SERPING1 CD86 0.0290 86% 97% 0.678 5.5E−053.7E−04 MMP9 SERPING1 CD19 0.0400 82% 99% 0.682 6.5E−05 1.9E−04 MMP9SERPING1 SERPINE1 0.0250 82% 99% 0.678 1.8E−04 2.0E−04 MMP9 SERPING1HMGB1 0.0330 86% 99% 0.684 5.5E−05 7.5E−04 MMP9 SERPING1 MPO 0.0340 91%97% 0.669 1.5E−04 3.1E−04 MMP9 PTGS2 NFKB1 0.0020 96% 99% 0.801 4.1E−048.6E−04 MMP9 PTGS2 MYC 0.0023 82% 99% 0.756 6.2E−05 7.9E−04 MMP9 PTGS2TGFB1 0.0023 82% 99% 0.757 0.0019 8.4E−04 MMP9 PTGS2 IL1R1 0.0039 82%99% 0.741 5.8E−05 3.5E−04 MMP9 PTGS2 IFI16 0.0019 82% 99% 0.754 1.4E−040.0029 MMP9 PTGS2 MPO 0.0079 91% 96% 0.684 6.1E−06 5.9E−04 MMP9 PTGS2EGR1 0.0110 77% 99% 0.677 1.4E−05 0.0011 MMP9 PTGS2 PLAUR 0.0230 77% 99%0.701 3.0E−04 1.5E−04 MMP9 PTGS2 C1QA 0.0081 86% 98% 0.708 2.1E−060.0014 MMP9 PTGS2 TNFSF5 0.0450 77% 99% 0.716 1.5E−06 3.9E−04 MMP9 PTGS2SERPINA1 0.0400 86% 99% 0.696 3.4E−04 2.2E−04 MMP9 PTGS2 LTA 0.0210 82%99% 0.695 1.1E−06 4.2E−04 MMP9 PTGS2 TIMP1 0.0170 77% 99% 0.696 1.3E−044.8E−04 MMP9 PTGS2 ICAM1 0.0430 86% 99% 0.693 2.0E−06 2.2E−04 MMP9 PTGS2TNF 0.0430 77% 99% 0.702 3.6E−06 8.3E−04 MMP9 PTGS2 SERPINE1 0.0300 82%99% 0.680 9.8E−06 0.0012 MMP9 HSPA1A IFI16 0.0001 86% 99% 0.808 2.5E−045.3E−04 MMP9 HSPA1A TLR2 0.0024 82% 99% 0.778 0.0120 5.8E−04 MMP9 HSPA1ANFKB1 0.0009 86% 99% 0.763 3.7E−05 2.6E−05 MMP9 HSPA1A EGR1 0.0015 86%99% 0.760 5.2E−05 3.9E−04 MMP9 HSPA1A MYC 0.0031 86% 99% 0.723 9.6E−061.8E−04 MMP9 HSPA1A IL1R1 0.0016 77% 98% 0.683 1.6E−05 6.4E−04 MMP9HSPA1A IL1B 0.0045 77% 99% 0.701 1.3E−05 2.2E−04 MMP9 HSPA1A SERPINA10.0110 73% 99% 0.686 2.1E−05 5.3E−05 MMP9 HSPA1A MPO 0.0035 82% 98%0.675 2.8E−06 7.3E−04 MMP9 HSPA1A PTGS2 0.0056 87% 99% 0.704 6.9E−060.0290 MMP9 HSPA1A ICAM1 0.0260 82% 99% 0.681 8.8E−07 1.2E−04 MMP9HSPA1A TGFB1 0.0100 82% 99% 0.684 2.8E−04 1.6E−04 MMP9 HSPA1A C1QA0.0093 86% 99% 0.695 1.5E−06 0.0079 MMP9 HSPA1A SERPINE1 0.0090 77% 99%0.690 4.3E−06 8.2E−04 MMP9 HSPA1A TIMP1 0.0190 73% 99% 0.695 5.9E−052.0E−04 MMP9 HSPA1A IL1RN 0.0400 68% 100% 0.656 6.0E−06 3.5E−04 MMP9HSPA1A IL18 0.0470 68% 100% 0.644 1.5E−06 5.8E−04 MMP9 IFI16 HMGB10.0027 96% 99% 0.746 0.0056 5.1E−04 MMP9 IFI16 CD4 0.0026 91% 99% 0.7613.0E−04 7.4E−04 MMP9 IFI16 MNDA 0.0040 77% 99% 0.702 1.0E−04 2.8E−04MMP9 IFI16 APAF1 0.0058 77% 100% 0.752 1.2E−04 0.0034 MMP9 IFI16 IL18BP0.0053 91% 99% 0.734 0.0019 0.0048 MMP9 IFI16 CD19 0.0068 77% 99% 0.7070.0043 0.0026 MMP9 IFI16 C1QA 0.0055 68% 99% 0.668 0.0006 0.0019 MMP9IFI16 CD86 0.0130 82% 97% 0.656 2.1E−04 4.6E−04 MMP9 IFI16 SERPINE10.0095 73% 99% 0.672 0.0230 6.2E−04 MMP9 IFI16 MYC 0.0180 73% 98% 0.6440.0039 0.0022 MMP9 IFI16 ADAM17 0.0200 73% 100% 0.676 1.5E−04 3.2E−04MMP9 IFI16 PTPRC 0.0220 77% 99% 0.685 6.5E−05 3.1E−04 MMP9 IFI16 CD140.0170 73% 99% 0.680 1.8E−04 3.1E−04 MMP9 IFI16 HMOX1 0.0320 86% 97%0.659 2.2E−04 0.0004 MMP9 IFI16 MPO 0.0330 73% 99% 0.626 0.0053 9.0E−04MMP9 IFI16 CD8A 0.0400 73% 98% 0.647 2.7E−04 2.4E−03 MMP9 IFI16 PLAUR0.0490 82% 99% 0.683 1.5E−04 4.5E−04 IFI16 HMGB1 IL1R1 0.0042 91% 98%0.775 3.5E−04 2.8E−04 IFI16 HMGB1 NFKB1 0.0066 91% 99% 0.777 6.3E−043.9E−04 IFI16 HMGB1 MPO 0.0096 77% 99% 0.729 8.8E−05 5.5E−04 IFI16 HMGB1MYC 0.0160 100% 97% 0.740 6.3E−05 9.1E−04 IFI16 HMGB1 TIMP1 0.0077 96%96% 0.727 1.5E−04 0.0010 IFI16 HMGB1 IL18BP 0.0100 77% 99% 0.736 1.8E−050.0015 IFI16 HMGB1 SERPINE1 0.0095 96% 96% 0.721 2.0E−05 0.0036 IFI16HMGB1 CD19 0.0120 86% 98% 0.714 1.8E−05 0.0016 IFI16 HMGB1 ELA2 0.028082% 99% 0.702 1.4E−05 1.3E−04 IFI16 HMGB1 TGFB1 0.0220 82% 98% 0.7071.8E−04 0.0015 IFI16 HMGB1 IL10 0.0240 73% 99% 0.702 3.9E−05 3.6E−04IFI16 HMGB1 C1QA 0.0280 77% 99% 0.725 8.8E−06 6.1E−04 IFI16 HMGB1 PTGS20.0270 82% 99% 0.721 1.1E−05 4.0E−04 IFI16 HMGB1 ADAM17 0.0430 77% 99%0.683 1.4E−04 4.8E−05 IFI16 HMGB1 IL18 0.0041 77% 99% 0.725 6.8E−052.2E−04 IFI16 SERPINE1 C1QA 0.0022 86% 99% 0.757 2.1E−04 3.9E−04 IFI16SERPINE1 PTGS2 0.0039 82% 100% 0.798 1.2E−05 4.1E−05 IFI16 SERPINE1IL18BP 0.0009 86% 98% 0.771 2.7E−05 0.0012 IFI16 SERPINE1 CD4 0.0041 91%98% 0.768 4.4E−05 7.3E−05 IFI16 SERPINE1 HMOX1 0.0047 86% 98% 0.7319.0E−06 1.1E−04 IFI16 SERPINE1 CD86 0.0088 77% 99% 0.693 1.7E−05 1.0E−04IFI16 SERPINE1 MYC 0.0130 77% 99% 0.675 6.2E−05 0.0024 IFI16 SERPINE1HSPA1A 0.0060 73% 99% 0.711 7.5E−06 6.6E−05 IFI16 SERPINE1 CD19 0.023073% 100% 0.686 1.5E−05 0.0082 IFI16 SERPINE1 MNDA 0.0330 82% 97% 0.6662.5E−06 7.8E−05 IFI16 SERPINE1 TLR2 0.0390 73% 99% 0.663 0.0036 0.0062IFI16 SERPINE1 MMP9 0.0230 73% 99% 0.672 0.0006 0.0095 GoldMine IFI16CD19 NFKB1 1.4E−04 82% 99% 0.718 4.9E−04 0.0013 0.0023 IFI16 CD19 MYC1.7E−04 82% 99% 0.718 1.6E−04 0.0018 0.0037 IFI16 CD19 MMP9 6.4E−04 77%99% 0.707 2.6E−04 0.0068 0.0043 IFI16 CD19 C1QA 1.0E−03 82% 96% 0.6701.4E−05 0.0009 0.0034 TGFB1 CD4 NFKB1 3.0E−05 86% 100% 0.855 0.02300.0017 0.0078 TGFB1 CD4 TLR2 1.1E−04 91% 99% 0.846 0.0079 7.6E−04 0.021TGFB1 CD4 IFI16 1.9E−04 91% 99% 0.823 0.0004 0.0028 0.0064 TGFB1 CD4IL1R1 2.4E−04 100% 97% 0.821 3.0E−04 8.3E−04 0.0028 TGFB1 CD4 IL103.6E−04 96% 99% 0.842 1.2E−04 7.2E−04 0.0099 TGFB1 CD4 SERPINA1 8.4E−0486% 99% 0.800 3.1E−04 6.3E−04 0.014 TGFB1 IL18BP IFI16 8.6E−06 73% 100%0.740 0.0014 0.0016 0.00072 TGFB1 IL18BP SERPING1 2.9E−05 86% 99% 0.7222.4E−04 0.0019 0.0021 TGFB1 IL18BP TLR2 6.4E−05 77% 99% 0.714 0.03105.9E−04 0.003 TGFB1 IL18BP PTGS2 1.9E−04 86% 99% 0.754 8.7E−06 0.00230.0024 TGFB1 IL18BP IL1R1 2.0E−04 82% 99% 0.727 8.5E−04 4.9E−04 0.0044TGFB1 PTGS2 IL1R1 1.0E−07 86% 99% 0.760 5.3E−04 6.5E−04 0.0025 TGFB1PTGS2 IFI16 1.2E−07 77% 100% 0.806 0.0027 0.0021 0.0019 TGFB1 PTGS2 TLR26.5E−07 91% 99% 0.814 0.0110 0.0015 0.0036 TGFB1 PTGS2 CD4 1.8E−06 91%99% 0.814 1.7E−05 0.0170 0.0016 TGFB1 PTGS2 SERPINA1 2.1E−06 96% 99%0.775 4.5E−04 0.0010 0.0034 TGFB1 PTGS2 MMP9 4.2E−06 82% 99% 0.7570.0023 8.4E−04 0.0019 TGFB1 PTGS2 SERPING1 4.0E−05 82% 99% 0.739 2.6E−040.0010 0.002 TGFB1 PTGS2 IL18BP 1.2E−04 86% 99% 0.754 8.7E−06 0.00240.0023 TGFB1 PTGS2 IL1B 1.5E−04 73% 100% 0.723 7.7E−05 9.7E−05 0.0028TGFB1 PTGS2 NFKB1 1.8E−04 82% 99% 0.711 0.0011 0.0041 0.0037 NFKB1 CD4TLR2 1.7E−06 100% 100% 0.993 0.0230 0.0110 0.032 NFKB1 CD4 MMP9 3.9E−0596% 100% 0.941 0.0190 0.0110 0.021 NFKB1 CD4 IL10 6.5E−05 96% 100% 0.9170.0090 0.0080 0.016 NFKB1 CD4 IFI16 1.1E−04 96% 99% 0.906 0.0013 0.00220.0075 NFKB1 CD4 TIMP1 4.7E−04 96% 99% 0.876 0.0038 0.0015 0.0091 NFKB1CD4 CD14 9.1E−04 96% 99% 0.852 8.6E−04 8.6E−04 0.018 NFKB1 CD4 IL1R10.0010 96% 99% 0.861 0.0079 0.0082 0.012 NFKB1 CD4 CYBB 0.0010 91% 99%0.863 4.3E−04 7.8E−04 0.015 NFKB1 IL18BP CD4 1.3E−04 91% 99% 0.8450.0060 0.0250 0.014 NFKB1 IL18BP SERPING1 1.7E−04 82% 100% 0.822 1.7E−049.2E−04 0.013 NFKB1 IL18BP PTGS2 4.8E−04 91% 99% 0.822 5.6E−05 7.0E−040.017 NFKB1 IL18BP IFI16 4.8E−04 82% 100% 0.820 8.2E−04 9.4E−04 0.0079TIMP1 CD4 MYC 0.0000 91% 100% 0.858 5.8E−04 6.7E−04 0.0064 TIMP1 CD4SERPING1 8.2E−05 91% 99% 0.804 1.3E−04 0.0011 0.0039 TIMP1 CD4 IFI161.2E−04 91% 99% 0.775 3.7E−04 0.0016 0.0037 TIMP1 CD4 SERPINA1 3.5E−0491% 98% 0.749 4.5E−04 4.3E−04 0.0048 TIMP1 CD4 EGR1 6.0E−04 100% 96%0.725 4.6E−04 1.8E−04 0.0098 TIMP1 CD4 TNFSF5 9.1E−04 86% 99% 0.7542.4E−05 4.4E−05 0.0051

TABLE 7 Latent class modeling-ranking of p-values form most to leastsignificant 1-gene model estimating RA v. Normal discrimination (RA, N =20, Normal N = 32) Incremental Gene 1 P-Value R-SQ ICAM1 3.70E−06 0.344STAT3 5.50E−06 0.365 TGFB1 7.50E−06 0.353 CSPG2 9.50E−06 0.314 TLR23.30E−05 0.295 HLADRA 5.20E−05 0.288 IL1B 0.00053 CASP9 0.00088 ITGAL0.0056 NFKBIB 0.0075 EGR1 0.0079 SERPINE1 0.013 TSC22D3 0.014 NFKB10.015 TGFBR2 0.031 CD4 0.044 CASP3 0.063 MMP9 0.087 IL1RN 0.098 CD140.17 BCL2 0.19 MEF2C 0.28 HSPA1A 0.52 IL18 0.56

TABLE 8 Latent class modeling-estimating RA v. Normal discrimination ina dataset using a 2-gene model (RA, N = 20, Normal N = 32) Gene 2 IncGene 1 Inc P- gene 1 gene 2 p-value % normals % RA R-SQ Value ICAM1HLADRA 0.0020 91% 90% 0.677 0.0020 ICAM1 HSPA1A 0.0100 72% 100%  0.4660.0003 ICAM1 CD14 0.0160 91% 70% 0.457 0.0010 ICAM1 TGFBR2 0.0310 72%95% 0.433 0.0008 STAT3 HSPA1A 0.0029 91% 95% 0.703 0.0006 STAT3 HLADRA0.0029 94% 90% 0.608 0.0011 STAT3 CD14 0.0077 94% 80% 0.550 0.0007 STAT3TGFBR2 0.0280 88% 85% 0.469 0.0010 TGFB1 HLADRA 0.0012 94% 90% 0.6750.0011 TGFB1 HSPA1A 0.0055 97% 85% 0.615 0.0004 CSPG2 HLADRA 0.0012 94%85% 0.626 0.0014 CSPG2 IL18 0.0052 75% 95% 0.478 0.0003 CSPG2 CD140.0370 88% 75% 0.409 0.0006 TLR2 HLADRA 0.0020 91% 85% 0.562 0.0016HLADRA CASP9 0.0015 88% 90% 0.542 0.0005 HLADRA MEF2C 0.0017 91% 85%0.552 0.0003 HLADRA ITGAL 0.0029 91% 80% 0.507 0.0004 HLADRA IL1B 0.006388% 75% 0.452 0.0016 HLADRA NFKBIB 0.0036 94% 70% 0.460 0.0009 HLADRACD4 0.0047 97% 65% 0.448 0.0004 HLADRA NFKB1 0.0077 97% 65% 0.430 0.0005HLADRA TGFBR2 0.0083 91% 80% 0.463 0.0004 HLADRA SERPINE1 0.0130 75% 90%0.403 0.0011 HLADRA CD14 0.0430 90.60%   70.00%   0.390 0.0008

TABLE 9 Latent class modeling-estimating RA v. Normal discrimination ina dataset using a 3-gene model (RA, N = 20, Normal N = 32) Gene 3 Inc LGGene 1 Inc Gene 2 Inc gene 1 gene 2 gene 3 p-value % normals % RA R-SQP-Value P-Value ICAM1 HLADRA HSPA1A 0.0270 97% 95% 0.827 0.0055 0.0110ICAM1 HLADRA MMP9 0.0320 97% 90% 0.794 0.0030 0.0130 ICAM1 HSPA1A TGFB10.0100 97% 85% 0.695 0.0320 0.0043 ICAM1 HSPA1A CSPG2 0.0430 88% 90%0.551 0.0081 0.0084 ICAM1 CD14 CSPG2 0.0092 100% 70% 0.625 0.0180 0.0100ICAM1 CD14 TGFBR2 0.0470 94% 75% 0.535 0.0018 0.0260 ICAM1 TGFBR2 STAT30.0220 88% 90% 0.570 0.0250 0.0100 ICAM1 TGFBR2 TGFB1 0.0280 88% 90%0.566 0.0100 0.0100 ICAM1 TGFBR2 CSPG2 0.0390 91% 90% 0.565 0.00840.0180 STAT3 HSPA1A TGFB1 0.0340 97% 95% 0.807 0.0068 0.0061 STAT3HLADRA MMP9 0.0290 94% 90% 0.677 0.0009 0.0037 STAT3 CD14 CSPG2 0.017091% 85% 0.640 0.0035 0.0051 STAT3 TGFBR2 ICAM1 0.0250 88% 90% 0.5700.0220 0.0100 TGFB1 HLADRA HSPA1A 0.0170 97% 95% 0.792 0.0014 0.0020TGFB1 HSPA1A MMP9 0.0370 97% 85% 0.699 0.0002 0.0025 CSPG2 HLADRA CD140.0370 97% 90% 0.708 0.0024 0.0035 CSPG2 IL18 CD14 0.0210 81% 100% 0.6200.0013 0.0027 CSPG2 IL18 HSPA1A 0.0490 88% 85% 0.552 0.0002 0.0040 CSPG2CD14 IL1B 0.0100 88% 90% 0.564 0.0009 0.0046 CSPG2 CD14 EGR1 0.0085 97%75% 0.563 0.0010 0.0048 CSPG2 CD14 TGFB1 0.0260 100% 70% 0.532 0.01800.0140 CSPG2 CD14 CASP9 0.0310 91% 80% 0.509 0.0026 0.0083 TLR2 HLADRAMEF2C 0.0260 97% 95% 0.671 0.0140 0.0012 TLR2 HLADRA MMP9 0.0430 97% 80%0.617 0.0008 0.0028 HLADRA CASP9 HSPA1A 0.0240 91% 95% 0.643 0.00240.0047 HLADRA IL1B CD4 0.0490 81% 95% 0.537 0.0008 0.0380

TABLE 10 Latent class modeling-estimating RA v. Normal discrimination ina dataset using a 4-gene model (RA, N = 20, Normal N = 32) Gene 4 Gene 1Gene 2 Gene 3 Inc Inc P- Inc P- Inc P- gene 1 gene 2 gene 3 gene 4p-value % normals % RA R-Sq Value Value Value ICAM1 HLADRA HSPA1A TGFB10.037 100% 100% 97% 0.049 0.012 0.029 ICAM1 HSPA1A TGFB1 MEF2C 0.035 90%100% 0.7551 0.018 0.0046 0.0045 ICAM1 HSPA1A CSPG2 IL18 0.043 70% 100%0.6233 0.049 0.014 0.0084 ICAM1 HSPA1A CSPG2 CD4 0.048 90% 91% 0.64770.0046 0.009 0.0096 ICAM1 HSPA1A CSPG2 MEF2C 0.046 100% 84% 0.62040.0085 0.011 0.022 ICAM1 CD14 CSPG2 TGFBR2 0.036 100% 91% 0.7388 0.0120.014 0.011 ICAM1 CD14 CSPG2 NFKBIB 0.041 85% 97% 0.7137 0.013 0.0140.0071 ICAM1 CD14 TGFBR2 CSPG2 0.011 100% 91% 0.7388 0.012 0.014 0.036ICAM1 CD14 TGFBR2 TGFB1 0.029 75% 100% 0.6232 0.02 0.036 0.025 STAT3HLADRA MMP9 CSPG2 0.018 90% 97% 0.7763 0.035 0.0087 0.025 STAT3 HLADRAMMP9 EGR1 0.023 90% 97% 0.7668 0.0044 0.0077 0.019 TGFB1 HLADRA HSPA1AICAM1 0.049 100% 100% 0.9662 0.037 0.012 0.029 CSPG2 HLADRA CD14 ITGAL0.03 100% 100% 0.9281 0.014 0.02 0.015 CSPG2 HLADRA CD14 TGFB1 0.043 95%97% 0.8883 0.024 0.013 0.019 CSPG2 HLADRA CD14 STAT3 0.027 95% 100%0.8679 0.023 0.018 0.019 CSPG2 HLADRA CD14 CASP9 0.026 90% 100% 0.8270.0057 0.011 0.017 CSPG2 IL18 CD14 CASP9 0.028 95% 91% 0.7401 0.00770.0085 0.012 CSPG2 IL18 CD14 IL1B 0.029 95% 91% 0.7367 0.0049 0.0120.069 CSPG2 IL18 CD14 EGR1 0.047 85% 97% 0.7193 0.0054 0.014 0.013 CSPG2IL18 HSPA1A TGFB1 0.028 90% 94% 0.7179 0.013 0.024 0.013 CSPG2 IL18HSPA1A ICAM1 0.049 70% 100% 0.623 0.008 0.043 0.014 CSPG2 CD14 IL1B IL180.012 95% 91% 0.7367 0.0049 0.0069 0.029 CSPG2 CD14 IL1B EGR1 0.031 90%91% 0.655 0.003 0.0027 0.024 CSPG2 CD14 EGR1 IL18 0.014 85% 97% 0.71930.0054 0.013 0.047 CSPG2 CD14 EGR1 CD4 0.033 85% 88% 0.6498 0.0033 0.0060.007 CSPG2 CD14 TGFB1 HLADRA 0.013 95% 97% 0.8883 0.024 0.019 0.043CSPG2 CD14 TGFB1 CD4 0.033 90% 88% 0.5939 0.008 0.013 0.015 CSPG2 CD14TGFB1 MEF2C 0.045 75% 94% 0.5822 0.012 0.02 0.017 CSPG2 CD14 CASP9HLADRA 0.011 90% 100% 0.827 0.0057 0.017 0.026 CSPG2 CD14 CASP9 IL180.0085 95% 91% 0.7401 0.0077 0.012 0.028 CSPG2 CD14 CASP9 MEF2C 0.04580% 94% 0.5717 0.0023 0.014 0.014 TLR2 HLADRA MEF2C HSPA1A 0.034 95% 94%0.7467 0.0063 0.003 0.032 TLR2 HLADRA MEF2C MMP9 0.038 90% 100% 0.74120.0041 0.0019 0.021 TLR2 HLADRA MMP9 CASP9 0.025 90% 94% 0.7223 0.0130.0035 0.013 TLR2 HLADRA MMP9 MEF2C 0.021 90% 100% 0.7412 0.0041 0.00190.038 TLR2 HLADRA MMP9 IL1B 0.045 85% 97% 0.6899 0.0033 0.0029 0.014HLADRA CASP9 HSPA1A TLR2 0.029 100% 97% 0.8174 0.0045 0.017 0.014

TABLE 11 Results from estimating 4 models specifying different numbersof latent classes. The 2 class model is preferred according to the BICcriterion. BIC(LL) 1-Class 828.3 2-Class 800.4 3-Class 809.3 4-Class831.4

TABLE 12 Estimated posterior membership probabilities for Normals LatentLatent Assigned to Group id# tPn CD4 Class1 Class2 Modal class N 1 34.4115.03 1.0000 0.0000 1 N 2 34.48 15.28 1.0000 0.0000 1 N 3 33.66 14.870.9991 0.0009 1 N 4 33.51 14.76 0.9985 0.0015 1 N 5 34.57 15.3 1.00000.0000 1 N 6 34.46 15.57 0.9999 0.0001 1 N 7 33.48 15.46 0.9820 0.0180 1N 8 33.73 15.65 0.9856 0.0144 1 N 9 33.95 14.8 0.9997 0.0003 1 N 1034.65 15.29 1.0000 0.0000 1 N 11 35.15 14.73 1.0000 0.0000 1 N 12 33.6315.07 0.9984 0.0016 1 N 13 34.14 15.31 0.9997 0.0003 1 N 14 33.66 14.30.9986 0.0014 1 N 15 35.03 14.97 1.0000 0.0000 1 N 16 32.32 14.67 0.92300.0770 1 N 17 32.07 14.48 0.8999 0.1001 1 N 18 33.81 14.89 0.9995 0.00051 N 19 33.06 14.15 0.9911 0.0089 1 N 20 32.91 14.11 0.9866 0.0134 1 N 2131.75 14.65 0.6621 0.3379 1 N 22 33.54 14.82 0.9986 0.0014 1 N 23 33.5615.22 0.9963 0.0037 1 N 24 32.16 14.5 0.9181 0.0819 1 N 25 33.29 14.080.9934 0.0066 1 N 26 32.8 14.18 0.9851 0.0149 1 N 27 33.67 15.26 0.99740.0026 1 N 28 31.49 13.83 0.8031 0.1969 1 N 29 35.89 16.05 1.0000 0.00001 N 30 34.22 14.91 0.9999 0.0001 1 N 31 33.58 14.67 0.9989 0.0011 1 N 3234.57 14.67 1.0000 0.0000 1 N 33 34.16 15.1 0.9998 0.0002 1 N 34 35.0215.48 1.0000 0.0000 1 N 35 33.79 14.9 0.9994 0.0006 1 N 36 34.45 15.061.0000 0.0000 1 N 37 34.79 15.36 1.0000 0.0000 1 N 38 33.35 15.02 0.99520.0048 1 N 39 34.59 15.3 1.0000 0.0000 1 N 40 33.48 14.73 0.9984 0.00161 N 41 34.43 15.4 0.9999 0.0001 1 N 42 35.06 15.23 1.0000 0.0000 1 N 4333.82 14.71 0.9995 0.0005 1 N 44 33.87 14.9 0.9996 0.0004 1 N 45 33.8215.16 0.9991 0.0009 1 N 46 34.78 15.05 1.0000 0.0000 1 N 47 33.79 14.940.9994 0.0006 1 N 48 34.37 15.67 0.9996 0.0004 1 N 49 32.28 14.47 0.94420.0558 1 N 50 32.34 14.85 0.8674 0.1326 1 N 51 32.85 15.08 0.9506 0.04941 N 52 32.31 14.71 0.9099 0.0901 1 N 53 34.56 15.39 1.0000 0.0000 1 N 5434.28 15 0.9999 0.0001 1 N 56 34.34 15.19 0.9999 0.0001 1 N 57 30.8913.93 0.6329 0.3671 1 N 58 32.66 14.84 0.9598 0.0402 1 N 59 31.4 14.220.7389 0.2611 1 N 60 32.38 14.33 0.9625 0.0375 1 N 61 32.29 14.53 0.94020.0598 1 N 62 32.61 13.94 0.9638 0.0362 1 N 63 33.13 14.37 0.9949 0.00511 N 64 33.81 14.76 0.9995 0.0005 1 N 65 33.08 14.73 0.9931 0.0069 1 N 6633.91 14.96 0.9996 0.0004 1 N 67 33.44 14.54 0.9982 0.0018 1 N 68 34.0715.03 0.9998 0.0002 1 N 69 35.18 15.18 1.0000 0.0000 1 N 70 33.88 14.860.9996 0.0004 1 N 71 32.03 14.74 0.7698 0.2302 1 N 72 34.05 15.32 0.99950.0005 1 N 73 34.19 14.79 0.9999 0.0001 1 N 74 34.77 15.06 1.0000 0.00001 N 75 31.99 13.93 0.9048 0.0952 1 N 76 33.56 14.68 0.9988 0.0012 1 N 7733.84 14.89 0.9995 0.0005 1 N 78 30.86 13.88 0.6323 0.3677 1 N 79 33.2315.14 0.9878 0.0122 1 N 80 34.05 14.88 0.9998 0.0002 1 N 81 35.2 15.731.0000 0.0000 1 N 82 33.27 14.66 0.9968 0.0032 1 N 83 34.29 15.34 0.99980.0002 1 N 84 33.1 14.04 0.9891 0.0109 1 N 85 33.92 14.64 0.9997 0.00031 N 86 32.29 14.17 0.9547 0.0453 1 N 87 33.78 15.12 0.9991 0.0009 1 N 8832.8 15.2 0.8895 0.1105 1 N 89 34.62 15.37 1.0000 0.0000 1 N 90 30.6613.63 0.5947 0.4053 1 N 91 32.75 14.99 0.9474 0.0526 1 N 92 33.62 14.280.9983 0.0017 1 N 93 31.47 13.97 0.8066 0.1934 1 N 94 33.55 14.92 0.99840.0016 1 N 95 32.69 13.74 0.9373 0.0627 1 N 96 34.59 14.56 1.0000 0.00001 N 97 32.34 14.46 0.9542 0.0458 1 N 98 33.64 14.43 0.9989 0.0011 1 N 9934.56 14.89 1.0000 0.0000 1 N 100 32.9 14.78 0.9854 0.0146 1 N 101 32.9314.73 0.9885 0.0115 1 N 102 32.74 14.75 0.9758 0.0242 1 N 103 32.6714.74 0.9703 0.0297 1 N 104 32.48 14.54 0.9654 0.0346 1 N 105 32.9314.99 0.9753 0.0247 1 N 106 33.71 14.26 0.9986 0.0014 1 N 107 33.7214.43 0.9991 0.0009 1 N 108 33.25 14.75 0.9962 0.0038 1 N 109 35.3816.17 1.0000 0.0000 1 N 110 31.79 14.12 0.8796 0.1204 1 N 111 31.3613.69 0.7446 0.2554 1 N 112 39.54 16.75 1.0000 0.0000 1 N 113 34.2814.68 0.9999 0.0001 1 N 114 31.66 14.01 0.8523 0.1477 1 N 115 31.9214.44 0.8674 0.1326 1 N 116 32.94 15.07 0.9680 0.0320 1 N 117 33.17 14.30.9950 0.0050 1 N 118 33.15 15 0.9897 0.0103 1 N 119 34.91 15.13 1.00000.0000 1 N 120 33.68 14.76 0.9992 0.0008 1 N 121 33.7 15.14 0.99850.0015 1 N 122 32.22 14.16 0.9474 0.0526 1 N 123 33.48 14.96 0.99770.0023 1 N 124 34.15 14.65 0.9998 0.0002 1 N 125 32.67 14.65 0.97670.0233 1 N 126 32.75 14.38 0.9855 0.0145 1 N 127 33.52 15.04 0.99760.0024 1 N 128 33.77 14.79 0.9994 0.0006 1 N 129 33.54 15.15 0.99690.0031 1 N 130 32.77 14.88 0.9693 0.0307 1 N 131 34.32 14.95 0.99990.0001 1 N 132 34.38 15.58 0.9998 0.0002 1 N 133 32.54 15.16 0.75760.2424 1 N 134 32.59 14.8 0.9545 0.0455 1

TABLE 13 Estimated posterior membership probabilities for RA subjectsLatent Latent Assigned to Group id# tPn CD4 Class1 Class2 Modal class RA1 30.44 16.7 0.0000 1.0000 2 RA 2 32.55 15.68 0.0521 0.9479 2 RA 3 30.2614.66 0.0135 0.9865 2 RA 4 31.69 15.36 0.0120 0.9880 2 RA 5 32.56 15.410.3758 0.6242 2 RA 6 31.35 15.05 0.0354 0.9646 2 RA 7 31.3 14.77 0.17640.8236 2 RA 8 30.91 15.21 0.0012 0.9988 2 RA 9 32.13 15.6 0.0113 0.98872 RA 10 31.48 16.37 0.0000 1.0000 2 RA 11 31.39 14.73 0.2747 0.7253 2 RA12 31.91 14.96 0.4034 0.5966 2 RA 13 30.15 13.81 0.4312 0.5688 2 RA 1431.43 15.22 0.0123 0.9877 2 RA 15 30.65 14.28 0.2956 0.7044 2 RA 1630.51 14.59 0.0496 0.9504 2 RA 17 30.47 14.42 0.1194 0.8806 2 RA 1830.13 15.05 0.0002 0.9998 2 RA 19 30.76 15.79 0.0000 1.0000 2 RA 2031.95 15.49 0.0123 0.9877 2 RA 21 30.22 14.22 0.1714 0.8286 2 RA 2230.81 14.52 0.1678 0.8322 2

TABLE 14 Estimated posterior membership probabilities for MS subjectsLatent Latent Assigned to Group id# tPn CD4 Class1 Class2 Modal class MSms1 31.91 14.2 0.8998 0.1002 1 MS ms10 32.47 14.51 0.9655 0.0345 1 MSms11 30.84 14.33 0.3490 0.6510 2 MS ms2 30.59 14.61 0.0550 0.9450 2 MSms3 32.93 14.44 0.9913 0.0087 1 MS ms4 32.47 14.8 0.9309 0.0691 1 MS ms532.08 14.52 0.8938 0.1062 1 MS ms6 34.26 14.62 0.9999 0.0001 1 MS ms733.24 14.95 0.9940 0.0060 1 MS ms8 32.08 15.87 0.0004 0.9996 2 MS ms933.14 14.72 0.9947 0.0053 1

TABLE 15 Relationship between the Clinical Outcomes at baseline and geneexpressions for the 22 washed-out RAs. Genes found to be significantlyrelated to each of 10 clinical outcomes* Predictor R² gene1 p-valuegene2 p-value W-CRP 0.748 IL8 0.0002 MMP9 0.0002 W-DAS 0.525 ILIR10.0039 IL8 0.0042 W-ESR 0.818 TNFSF6 5.5E−06 CD14 0.0011 W-HAQ 0.000PHYSassessDisease 0.433 IL10 0.0100 SubAssessDisease 0.473 TLR2 0.00015IL18 0.0390 SubAssessPain 0.224 GCLC 0.0250 SwollJoints 0.570 IL80.00044 TLR2 0.0150 TenderJoints 0.500 IL1R1 0.00063 CXCL1 0.0060SharpScore 0.000 *The stepwise ordinal logit modeling procedure enteredthe most significant (labeled ‘gene1’) and up to one additionalpredictor (labeled ‘gene2’) in the model

TABLE 16 Average % CD Standard sample (s/ x) mean s = .2 s = .5 s = 1.0s = 2.0 TLR2 15.87 1.26% 3.15% 6.30% 12.60% CD4 14.86 0.08% 0.21% 0.42%0.85% NFKB1 17.29 0.00% 0.01% 0.02% 0.05%

TABLE 17 Expected Percentage of Variance of Y Reproduced by Y′ SquaredCorrelation (Y, Y′) s = .2 s = .5 s = 1.0 s = 2.0 TLR2 94% 72% 39% 14%CD4 88% 54% 23%  7% NFKB1 93% 67% 33% 11%

TABLE 18 Observed Percentage of Variance of Y Reproduced by Y′ using thegenerated data Observed Squared Correlation (Y, Y′) s = .2 s = .5 s =1.0 s = 2.0 TLR2 93% 76% 38% 20% CD4 90% 57% 31% 13% NFKB1 91% 67% 40%16%

TABLE 19 Summary of results from simulation including the discriminationR² based on 2 models. Logit model Adding no error - original data R² = 1LC model N Minimum Maximum Mean Std. Deviation # misclassified R² = .73TLR2 155 13.49 17.60 15.87 0.79 none CD4 156 13.63 16.76 14.86 0.54NFKB1 156 15.70 21.94 17.29 0.71 Adding small amount of error - s = 0.2R² = .87 N Minimum Maximum Mean Std. Deviation # misclassified R² = .56tP2lr 155 13.41 17.68 15.89 0.83 2 RAs + 2 Norms cP2d 156 13.37 16.9114.86 0.59 nP2fk 156 15.35 21.74 17.29 0.74 Adding moderate amount oferror - s = 0.5 R² = .55 N Minimum Maximum Mean Std. Deviation #misclassified R² = .53 tl5r 155 13.17 17.93 15.91 0.90 4 RAs + 8 Normsc5d 156 12.78 16.83 14.88 0.76 n5fk 156 15.15 21.80 17.32 0.89 Addinglarge amount of error - s = 1 R² = .33 N Minimum Maximum Mean Std.Deviation # misclassified R² = .33 tlr 155 12.80 19.08 15.80 1.25 4RAs + 38 Norms cd 156 12.41 17.51 15.04 1.17 nfk 156 14.22 23.06 17.231.20 Adding very large amount of error - s = 2 R² = .23 N MinimumMaximum Mean Std. Deviation # misclassified R² = .28 tl2r 155 10.4120.50 16.00 1.93 5 RAs + 42 Norms c2d 156  8.13 20.23 14.91 2.21 n2fk156 11.42 23.63 17.09 2.11

1. A method of determining whether a human subject is suffering from oris at risk of developing rheumatoid arthritis, based on a blood samplefrom the subject, the sample providing a source of RNAs, the methodcomprising: using quantitative amplification to obtain a quantitativemeasure of the amount of at least 2 constituents as distinct RNAconstituents in the subject sample, wherein the first constituent isselected from the group consisting of TLR2, MMP9 and TGFB1, and thesecond constituent is selected from the group consisting of CD4, PTGS2and HSPA1A, wherein the constituents are selected so as to distinguishfrom a normal and a rheumatoid arthritis-diagnosed subject with at least75% accuracy, and wherein such measure for each constituent is obtainedunder measurement conditions that are (i) within a degree ofrepeatability of better than five percent; and (ii) the efficiencies ofamplification are within two percent for each constituent, to arrive ata subject profile data set of a plurality of members; assessing thesubject profile data set; and comparing the subject profile data set toa baseline profile data set, wherein the baseline profile data set isderived from one or more subjects known not to be suffering fromrheumatoid arthritis; wherein a change in the expression pattern of thesubject profile data set as compared to the baseline profile data setindicates that the subject is suffering from or is at risk of developingrheumatoid arthritis.
 2. The method of claim 1, wherein said rheumatoidarthritis-diagnosed subject is washed out from therapy.
 3. A method ofclaim 1, wherein the constituents are selected as to permitcharacterizing the severity of rheumatoid arthritis in relation to anormal subject over time so as to track movement toward normal as aresult of successful therapy and away from normal in response tosymptomatic flare.
 4. The method of claim 1, wherein the at least thefirst constituent includes TRL2.
 5. A method according to claim 1,further comprising obtaining quantitative measure of one or moreconstituents selected from the group consisting of IL18BP, HMBG1, C1QA,SERPING1, MYC, NFKB1, TNFSF5, LTA, TGFB1, DPP4, EGR1, IL1R1, ICAM1,IL1RN, TIMP1, MPO, MMP9, TNFSF6, IL1B, IFI16, and SERPINE1.
 6. Themethod of claim 1, wherein the at least the first constituent includesMMP9.
 7. A method according to claim 1, further comprising obtainingquantitative measure of one or more constituents selected from the groupconsisting of PTGS2, IFI16, C1QA, IL1R1, MYC, SERPINE1, MPO, NFKB1,TGFB1, EGR1, PLAUR, TNFSF5, SERPINA1, LTA, TIMP1, ICAM1, TNF, TLR2, andIL1B.
 8. The method of claim 1, wherein the measurement conditions thatare substantially repeatable are within a degree of repeatability ofbetter than three percent.
 9. The method of claim 1, wherein theefficiency of amplification for all constituents is less than onepercent.